[chord] my plans to change merkle

Fri Aug 1 14:30:59 EDT 2003

On Fri, 2003-08-01 at 12:53, Benjie Chen wrote:
> yeah, i can do that. when?
> 
> as far as i can tell, the bottom level of the Merkle tree does not
> actually contain the keys, but just a hash over all the keys.
> currently, when you insert a key into the Merkle tree, you will
> need to rehash all the keys in that range to produce the hash.
> josh, please correct me if i am wrong.
> 

It looks like you are right; I guess Josh thought that storing the keys
would require to much memory... (BTW: Josh is almost certainly not
listening to this conversation; he's probably on a beach in Spain
laughing at us).

I'm still worried about caching the blocks since this means that the
tree is out of date with respect to the DB. Won't we end up transferring
keys from other nodes that we already store? 

Also: this is really only a concern for the single node case, right? In
a distributed system this DB load is so spread around that it probably
won't be felt.

Why don't we talk about this on Monday.

--Frank

> that said, the db3 documentation claims that with the b-tree access
> methods, the index is contained entirely in memory. so iterating
> them should not be too bad...
> 
> but, a simple test shows otherwise. if you use dbm to insert 1024
> blocks, trying 32 blocks at a time, this is the performance
> 
> trial 0  - 10.5 s
> trial 1  - 14.8 s
> trial 2  - 18.8 s
> trial 3  - 38.1 s
> trial 4  - 53.3 s
> trial 5  - 32.0 s  <--- probably some db3 caching parameter is changed here
> trial 6  - 35.6 s
> trial 7  - 40.4 s
> trial 8  - 44.4 s
> trial 9  - 50.3 s
> trial 10 - 55.0 s
> 
> to investigate why the performance drops, i made the following changes
>   - on a block insert, only insert data into db, but don't insert key
>     into the merkle tree. instead, keep it in a vector
>   - after 1024 blocks, schedule a callback to insert keys in the vector
>     into the merkle tree (note, i didn't change the actual merkle tree
>     insert code, so it still iterates the db on every insert)
>   - measure time takes to insert keys into the tree
> 
> note, i am cheating here, because each dbm call also inserts 1024 blocks.
> but that's intentional, i want to see where the time is spent...
> 
>           dbm time    time to insert into merkle tree
> trial 0   7.5         4.7
> trial 1   7.6         8.5
> trial 2   7.4         22.5
> trial 3   7.2         40.7
> 
> okay, so we now know where the time goes. there are several possible causes
> 
>   - iterating is expensive
> 
>   - to iterate, we use DB_SET_RANGE to find the smallest key of a range,
>     but it doesn't seem like db3 can guarantee it will return that range
>     before returning a key out of that range.
> 
> having done this, i think one of the two following changes may be
> acceptable
> 
>   - keep Merkle tree completely in cache, even with the lowest level,
>     so we don't ever need to iterate the db. 1,000,000 blocks require
>     20M... perhaps a bit expensive
> 
>   - batch insert keys into the Merkle tree, like i am doing now, but
>     only iterate the entire DB once for all the inserts. this is probably
>     the best solution.
> 
> but i will discuss this in a meeting before committing anything.
> 
> and josh, any insights? please chip in... thanks.
> 
> benjie
> 
> 
> 
> 
> On Fri, Aug 01, 2003 at 12:16:01PM -0400, Frank Dabek wrote:
> > Benjie,
> > 
> > 	We talked about this at Chord meeting today and are all a bit confused.
> > Isn't the bottom level of the Merkle tree already an in-memory cache of
> > all of the keys in the system? 
> > 
> > 	Why don't you present these changes at a meeting next week before you
> > start hacking the code?
> > 
> > --Frank
> > 
> > On Fri, 2003-08-01 at 09:16, Benjie Chen wrote:
> > > yes, i was planning to insert into the db, but not into the
> > > merkle tree, on every block insert. when inserting keys into
> > > the tree, we will end up traversing the db to keep the tree
> > > fresh.
> > > 
> > > benjie
> > > 
> > > 
> > > On Fri, Aug 01, 2003 at 07:32:41AM -0400, Frans Kaashoek wrote:
> > > > wouldn't it be better to do the insert in the database, but
> > > > don't iteratate throught the whole database on each insert?
> > > > 
> > > > perhaps i don't understand the plan, but this sounds like a
> > > > bad hack.
> > > > 
> > > > On Fri, 01 August 2003 at 00:21 (-0400), Benjie Chen wrote:
> > > > > 
> > > > > so i mentioned earlier that there is a slight efficiency issue with
> > > > > the Merkle implementation, namely that it tries to iterate thru the
> > > > > database on every db insert, and that this could be quite expensive.
> > > > > 
> > > > > my plan to fix/change this is to keep a fixed sized array of
> > > > > inserted keys in memory, and periodically inserts them into the
> > > > > merkle tree, instead of insert into the merkle tree on every store.
> > > > > in the worst case, i will insert keys into the merkle tree once
> > > > > every time the synchronization protocol runs.
> > > > > 
> > > > > with only 200K, i can keep 10,000 keys. so this should be okay
> > > > > memory-wise.
> > > > > 
> > > > > anyone has a problem with this?
> > > > > 
> > > > > benjie
> > > > > 
> > > > > 
> > > > > -- 
> > > > > benjie chen
> > > > > benjie at lcs.mit.edu
> > > > > _______________________________________________
> > > > > chord mailing list
> > > > > chord at amsterdam.lcs.mit.edu
> > > > > https://amsterdam.lcs.mit.edu/mailman/listinfo/chord