[chord] dhashclient question

Yanyan Wang Yanyan.Wang at colorado.edu
Mon Apr 10 00:09:06 EDT 2006


Or another possible reason for my observed high robustness performance could be,
my file has too small size(200 bytes) to require 7 fragments to recover? Thanks!

Regards,
Yanyan :)

Quoting Yanyan Wang <ywang at colorado.edu>:

> Thank you very much for your reply. From my calculation, I get the
> probability
> for a block to be available when half of the CFS nodes are down is about 0.6
> based on the formula when m=7, l=14, p0=0.5. This is a much worse robustness
> performance than that when there are just 6 replicas for a block.
>
> >From your dhblock.c source code, you configure both replica and erasure
> coded
> fragment:
>
> #define set_int Configurator::only ().set_int
>   /** MTU **/
>   ok = ok && set_int ("dhash.mtu", 1210);
>   /** Number of fragments to encode each block into */
>   ok = ok && set_int ("dhash.efrags", 14);
>   /** XXX Number of fragments needed to reconstruct a given block */
>   ok = ok && set_int ("dhash.dfrags", 7);
>   /** XXX Number of replica for each mutable block **/
>   ok = ok && set_int ("dhash.replica", 5);
>   assert (ok);
> #undef set_int
>
> So do you use replica and erasure coded fragments at the same time to
> guarantee
> the robustness of the CFS network? Thank you very much!
>
> Thanks,
> Yanyan :)
>
>
> Quoting Frank Dabek <fdabek at gmail.com>:
>
> > DHash's default replication strategy creates 14 erasure coded
> > fragments of each block; 7 fragments are necessary to reconstruct the
> > block. The math for figuring out what failure rate you should expect
> > is a little harder, but you can find it in our paper at the 1st NSDI
> > "Designing a DHT for..."; Also look for a Weatherspoon paper that has
> > a similar formula. Hopefully this works out to the right number. If
> > you'd rather not do the math you can configure DHash to use
> > replication. Try setting the efrags and dfrags parameters in your
> > configuration file. You'll want dfrags = 1 and efrags = (replication
> > level)
> >
> > Some unsolicited advice: what are you trying to model by killing half
> > the nodes? CA falls into the ocean? I know that's how we did it in our
> > early papers, but, in retrsopect, it was a pretty unrealistic
> > approach.
> >
> > --Frank
> >
> > On 4/8/06, Yanyan Wang <Yanyan.Wang at colorado.edu> wrote:
> > > This is a follow-up question to a question I asked you before. I am
> trying
> > to do
> > > robustness experiments exactly like your experiments in section 7.2.5
> > effect of
> > > failure in your SOSP'01 paper "Wide-area cooperative storage with CFS".
> > From
> > > your experiment results and explanations, I should be able to observe the
> > error
> > > rate about 0.5^6=0.016 when the fraction of failed nodes equal to 0.5
> when
> > CFS
> > > keeps 6 copies of a block. Based on my understanding of your source code,
> > you
> > > have set the number of copies of a block to 6. But I always get the error
> > rate
> > > about 0.002 in my experiments. I only got the error rate equal to 0.008
> in
> > my
> > > first experiment run. I don't know what the problem is. So I have to ask
> > for
> > > your favor to help me think of some explanations.
> > >
> > > I did my experiments on Emulab. For each experiments, I copied the
> compiled
> > CFS
> > > binary distribution "lsd", "filestore" onto each testbed host. Then I
> > started
> > > 1000 CFS servers on these testbed hosts with "lsd". A client script on
> one
> > of
> > > the CFS servers sends 1000 "filestore" store requests to the CFS network.
> > Then
> > > half of the lsd processes(chosen randomly) were killed. Then the client
> > script
> > > sends 1000 "filestore" retrieve requests to the CFS network, each after
> one
> > > second. After each experiment run, I cleaned up the execution
> environments
> > of
> > > all the CFS servers (basically the db for all the servers).
> > >
> > > Another observation is, all the errors I observed happened for the
> retrieve
> > > requests sent early. But I think that the errors that should happen in
> this
> > > experiment are those because of loss of all the six copies of a block.
> Then
> > > these errors could happen for any retrieval requests because the errors
> are
> > > unrecoverable.
> > >
> > > So I am very confused about my observations. I am wondering if CFS has
> > other
> > > mechanism in decreasing the error rate or if there is any problem in my
> > > experiment setup. Thank you very much for your help!
> > >
> > > Thanks,
> > > Yanyan :)
> > >
> > >
> > > Quoting Yanyan Wang <Yanyan.Wang at colorado.edu>:
> > >
> > > > Hello chord authors,
> > > >
> > > > I met a weird problem when I tried the robustness of chord network.
> Could
> > you
> > > > please kindly help me explain it? I used the chord prototype to do the
> > > > experiment. I did the following:
> > > >
> > > > 1. I started two chord nodes each of which has 16 virtual nodes;
> > > > 2. I inserted 32 strings into the chord network and got 32
> corresponding
> > > > keys;
> > > > 3. I killed one of the lsd processes;
> > > > 4. I retrieved the 32 keys.
> > > >
> > > > >From the explanation in your sigcomm paper, I expected about half of
> the
> > 32
> > > > key
> > > > retrievals should fail because of the key lost as the result of my
> > killing
> > > > one
> > > > lsd process. But my result is all the retrievals succeed in getting the
> > > > corresponding strings. I am very surprised at this result. The logs of
> > the
> > > > live
> > > > chord node for the retrieval of key
> > 9f8ff2967cff70d7373cb304807e053a44d8a047
> > > > are:
> > > >
> > > > ...
> > > > lsd: will order successors 1
> > > > lsd: dhash_download failed:
> 9f8ff2967cff70d7373cb304807e053a44d8a047:0:1:
> > > > DHASH_NOENT at 3eb6ccaf98300a77989e0059cbe9a465bbc6f35d
> > > > lsd: dhash_download failed:
> 9f8ff2967cff70d7373cb304807e053a44d8a047:0:1:
> > > > DHASH_NOENT at 40f746245f6c3bb924ad6e5ddcf15811ea456e9a
> > > > lsd: dhash_download failed:
> 9f8ff2967cff70d7373cb304807e053a44d8a047:0:1:
> > > > DHASH_NOENT at 1ac46b2343075c88e9c0f601e96aa64b3651d8d7
> > > > lsd: dhash_download failed:
> 9f8ff2967cff70d7373cb304807e053a44d8a047:0:1:
> > > > DHASH_NOENT at 48896252b67623a835fb117b6559c70eeb583fc7
> > > > lsd: dhash_download failed:
> 9f8ff2967cff70d7373cb304807e053a44d8a047:0:1:
> > > > DHASH_NOENT at 15282e83543d99cbf76669c03b4d26b0e15b96da
> > > > lsd: dhash_download failed:
> 9f8ff2967cff70d7373cb304807e053a44d8a047:0:1:
> > > > DHASH_NOENT at 6793bc04e0c544d5c23e84c6c0605cd0998ee2b7
> > > > lsd: dhash_download failed:
> 9f8ff2967cff70d7373cb304807e053a44d8a047:0:1:
> > > > DHASH_NOENT at 766a83e4c5a2f0e76723006c6dad002de53aeaf9
> > > >
> > > > It seems all dhash_download are failed. But why it still can get the
> > correct
> > > > string of the key? I am very confused and I would be very appreciated
> for
> > any
> > > > ideas. Thanks a lot!
> > > >
> > > > Thanks!
> > > > Yanyan :)
> > > >
> > > >
> > > > =================================
> > > > Yanyan Wang
> > > > Department of Computer Science
> > > > University of Colorado at Boulder
> > > > Boulder, CO, 80302
> > > > =================================
> > > >
> > > > _______________________________________________
> > > > chord mailing list
> > > > chord at amsterdam.lcs.mit.edu
> > > > https://amsterdam.lcs.mit.edu/mailman/listinfo/chord
> > > >
> > >
> > >
> > > Yanyan :)
> > >
> > >
> > > =================================
> > > Yanyan Wang
> > > Department of Computer Science
> > > University of Colorado at Boulder
> > > Boulder, CO, 80302
> > > =================================
> > >
> > > _______________________________________________
> > > chord mailing list
> > > chord at amsterdam.lcs.mit.edu
> > > https://amsterdam.lcs.mit.edu/mailman/listinfo/chord
> > >
> >
>



More information about the chord mailing list