[chord] tomorrow's chord meeting
Benjie Chen
benjie at amsterdam.lcs.mit.edu
Tue Jun 3 19:07:23 EDT 2003
i am going to talk about what i would like to see in Chord/DHash.
my app is quite different from some of Chord/DHash's visions. here
are some thoughts.
- a likely deployment scenario for my application is a wide-area
storage system with anywhere between 2 to 100 nodes.
- the main property i would like to get out of a wide-area storage
system is availability when the network is partitioned. for data
reliability, i am counting on aggressive local caching and local
physical backups.
for example, suppose there are two users, each donating part of
his/her disk for cooperative storage. if one user writes a set of
data, then he disconnects his machine from the network, it would
be nice if the other user can still access the new data. each user
aggressively cache everything he/she reads/writes. if both
computers fail, the data is still available from backup.
- it would be nice if caching occurs at the user's lsd. i don't
want to have to cache data already stored in the lsd's DB.
- DHash does not currently do this; but i can make this happen.
- i am not sure if fragments make sense in this scenario. there may
not be enough nodes in the system for the current coding
parameter used by DHash. also, blocks seem to make more sense for
the purpose of caching.
- i can always just store all fragments of a block in the local DB
and re-assemble on every load. how expensive would this be?
- some of my blocks represent per-user soft-state data, they don't
need to be replicated immediately after written. if they are lost,
i can re-construct them. may be we can replicate them lazilly.
- i can explicitly insert them to local DB, then count on the
merkle stuff to move them to other nodes later. this is also not
a difficult change.
- admission control: i don't want unauthorized users or nodes to
access data or join the network. i am willing to have each node
keep a list of allowed public keys. then i can authenticate join
messages, keep-alive messages, and requests.
- i'd like to have support for NATs. one way to do this is to have
each protocol message contain what the lsd thinks its addr is. if
a keep-alive message does not match the src ip addr in the packet,
then it's ignored. requests from a host behind a NAT are still
honored. something like that. basically a node behind a NAT may
have a different idea of what the store system looks like.
- since there aren't that many nodes, i want to avoid multi-hop
routing.
- the ideal situation is that i know the configuration of the
entire system. then everything can be done in one-step, even
replication. but i'd be inventing another one-hop p2p system.
- i can deal with two-step replication (first step get successor,
then replicate). it would be nice, however, if i don't have to
do multi-hop lookups on every insert/retrieve. the old
Chord/DHash had cached locations (i.e. node id to ip addr
mapping). i want that.
- i could implement my own routing layer (i.e. chord, debruijn).
but i think turning location caching back on (perhaps w/ a
flag) may be sufficient.
--
benjie chen
benjie at lcs.mit.edu
More information about the chord
mailing list