[chord] tomorrow's chord meeting

Tue Jun 3 19:07:23 EDT 2003

i am going to talk about what i would like to see in Chord/DHash.
my app is quite different from some of Chord/DHash's visions. here
are some thoughts.

  - a likely deployment scenario for my application is a wide-area
    storage system with anywhere between 2 to 100 nodes.

  - the main property i would like to get out of a wide-area storage
    system is availability when the network is partitioned. for data
    reliability, i am counting on aggressive local caching and local
    physical backups.

    for example, suppose there are two users, each donating part of
    his/her disk for cooperative storage. if one user writes a set of
    data, then he disconnects his machine from the network, it would
    be nice if the other user can still access the new data. each user
    aggressively cache everything he/she reads/writes. if both
    computers fail, the data is still available from backup.

    - it would be nice if caching occurs at the user's lsd. i don't
      want to have to cache data already stored in the lsd's DB.

      - DHash does not currently do this; but i can make this happen.

  - i am not sure if fragments make sense in this scenario. there may
    not be enough nodes in the system for the current coding
    parameter used by DHash. also, blocks seem to make more sense for
    the purpose of caching. 

    - i can always just store all fragments of a block in the local DB
      and re-assemble on every load. how expensive would this be?

  - some of my blocks represent per-user soft-state data, they don't
    need to be replicated immediately after written. if they are lost,
    i can re-construct them. may be we can replicate them lazilly.

    - i can explicitly insert them to local DB, then count on the
      merkle stuff to move them to other nodes later. this is also not
      a difficult change.

  - admission control: i don't want unauthorized users or nodes to
    access data or join the network. i am willing to have each node
    keep a list of allowed public keys. then i can authenticate join
    messages, keep-alive messages, and requests.

  - i'd like to have support for NATs. one way to do this is to have
    each protocol message contain what the lsd thinks its addr is. if
    a keep-alive message does not match the src ip addr in the packet,
    then it's ignored. requests from a host behind a NAT are still
    honored. something like that. basically a node behind a NAT may
    have a different idea of what the store system looks like.

  - since there aren't that many nodes, i want to avoid multi-hop
    routing.

    - the ideal situation is that i know the configuration of the
      entire system. then everything can be done in one-step, even
      replication. but i'd be inventing another one-hop p2p system.

    - i can deal with two-step replication (first step get successor,
      then replicate). it would be nice, however, if i don't have to
      do multi-hop lookups on every insert/retrieve. the old
      Chord/DHash had cached locations (i.e.  node id to ip addr
      mapping). i want that.

      - i could implement my own routing layer (i.e. chord, debruijn).
	but i think turning location caching back on (perhaps w/ a
	flag) may be sufficient.

-- 
benjie chen
benjie at lcs.mit.edu