[chord] my plans to change merkle
Benjie Chen
benjie at amsterdam.lcs.mit.edu
Fri Aug 1 13:53:25 EDT 2003
yeah, i can do that. when?
as far as i can tell, the bottom level of the Merkle tree does not
actually contain the keys, but just a hash over all the keys.
currently, when you insert a key into the Merkle tree, you will
need to rehash all the keys in that range to produce the hash.
josh, please correct me if i am wrong.
that said, the db3 documentation claims that with the b-tree access
methods, the index is contained entirely in memory. so iterating
them should not be too bad...
but, a simple test shows otherwise. if you use dbm to insert 1024
blocks, trying 32 blocks at a time, this is the performance
trial 0 - 10.5 s
trial 1 - 14.8 s
trial 2 - 18.8 s
trial 3 - 38.1 s
trial 4 - 53.3 s
trial 5 - 32.0 s <--- probably some db3 caching parameter is changed here
trial 6 - 35.6 s
trial 7 - 40.4 s
trial 8 - 44.4 s
trial 9 - 50.3 s
trial 10 - 55.0 s
to investigate why the performance drops, i made the following changes
- on a block insert, only insert data into db, but don't insert key
into the merkle tree. instead, keep it in a vector
- after 1024 blocks, schedule a callback to insert keys in the vector
into the merkle tree (note, i didn't change the actual merkle tree
insert code, so it still iterates the db on every insert)
- measure time takes to insert keys into the tree
note, i am cheating here, because each dbm call also inserts 1024 blocks.
but that's intentional, i want to see where the time is spent...
dbm time time to insert into merkle tree
trial 0 7.5 4.7
trial 1 7.6 8.5
trial 2 7.4 22.5
trial 3 7.2 40.7
okay, so we now know where the time goes. there are several possible causes
- iterating is expensive
- to iterate, we use DB_SET_RANGE to find the smallest key of a range,
but it doesn't seem like db3 can guarantee it will return that range
before returning a key out of that range.
having done this, i think one of the two following changes may be
acceptable
- keep Merkle tree completely in cache, even with the lowest level,
so we don't ever need to iterate the db. 1,000,000 blocks require
20M... perhaps a bit expensive
- batch insert keys into the Merkle tree, like i am doing now, but
only iterate the entire DB once for all the inserts. this is probably
the best solution.
but i will discuss this in a meeting before committing anything.
and josh, any insights? please chip in... thanks.
benjie
On Fri, Aug 01, 2003 at 12:16:01PM -0400, Frank Dabek wrote:
> Benjie,
>
> We talked about this at Chord meeting today and are all a bit confused.
> Isn't the bottom level of the Merkle tree already an in-memory cache of
> all of the keys in the system?
>
> Why don't you present these changes at a meeting next week before you
> start hacking the code?
>
> --Frank
>
> On Fri, 2003-08-01 at 09:16, Benjie Chen wrote:
> > yes, i was planning to insert into the db, but not into the
> > merkle tree, on every block insert. when inserting keys into
> > the tree, we will end up traversing the db to keep the tree
> > fresh.
> >
> > benjie
> >
> >
> > On Fri, Aug 01, 2003 at 07:32:41AM -0400, Frans Kaashoek wrote:
> > > wouldn't it be better to do the insert in the database, but
> > > don't iteratate throught the whole database on each insert?
> > >
> > > perhaps i don't understand the plan, but this sounds like a
> > > bad hack.
> > >
> > > On Fri, 01 August 2003 at 00:21 (-0400), Benjie Chen wrote:
> > > >
> > > > so i mentioned earlier that there is a slight efficiency issue with
> > > > the Merkle implementation, namely that it tries to iterate thru the
> > > > database on every db insert, and that this could be quite expensive.
> > > >
> > > > my plan to fix/change this is to keep a fixed sized array of
> > > > inserted keys in memory, and periodically inserts them into the
> > > > merkle tree, instead of insert into the merkle tree on every store.
> > > > in the worst case, i will insert keys into the merkle tree once
> > > > every time the synchronization protocol runs.
> > > >
> > > > with only 200K, i can keep 10,000 keys. so this should be okay
> > > > memory-wise.
> > > >
> > > > anyone has a problem with this?
> > > >
> > > > benjie
> > > >
> > > >
> > > > --
> > > > benjie chen
> > > > benjie at lcs.mit.edu
> > > > _______________________________________________
> > > > chord mailing list
> > > > chord at amsterdam.lcs.mit.edu
> > > > https://amsterdam.lcs.mit.edu/mailman/listinfo/chord
--
benjie chen
benjie at lcs.mit.edu
More information about the chord
mailing list