[chord] dhashclient question

Yanyan Wang Yanyan.Wang at colorado.edu
Sun Apr 9 01:06:45 EDT 2006


This is a follow-up question to a question I asked you before. I am trying to do
robustness experiments exactly like your experiments in section 7.2.5 effect of
failure in your SOSP'01 paper "Wide-area cooperative storage with CFS". From
your experiment results and explanations, I should be able to observe the error
rate about 0.5^6=0.016 when the fraction of failed nodes equal to 0.5 when CFS
keeps 6 copies of a block. Based on my understanding of your source code, you
have set the number of copies of a block to 6. But I always get the error rate
about 0.002 in my experiments. I only got the error rate equal to 0.008 in my
first experiment run. I don't know what the problem is. So I have to ask for
your favor to help me think of some explanations.

I did my experiments on Emulab. For each experiments, I copied the compiled CFS
binary distribution "lsd", "filestore" onto each testbed host. Then I started
1000 CFS servers on these testbed hosts with "lsd". A client script on one of
the CFS servers sends 1000 "filestore" store requests to the CFS network. Then
half of the lsd processes(chosen randomly) were killed. Then the client script
sends 1000 "filestore" retrieve requests to the CFS network, each after one
second. After each experiment run, I cleaned up the execution environments of
all the CFS servers (basically the db for all the servers).

Another observation is, all the errors I observed happened for the retrieve
requests sent early. But I think that the errors that should happen in this
experiment are those because of loss of all the six copies of a block. Then
these errors could happen for any retrieval requests because the errors are
unrecoverable.

So I am very confused about my observations. I am wondering if CFS has other
mechanism in decreasing the error rate or if there is any problem in my
experiment setup. Thank you very much for your help!

Thanks,
Yanyan :)


Quoting Yanyan Wang <Yanyan.Wang at colorado.edu>:

> Hello chord authors,
>
> I met a weird problem when I tried the robustness of chord network. Could you
> please kindly help me explain it? I used the chord prototype to do the
> experiment. I did the following:
>
> 1. I started two chord nodes each of which has 16 virtual nodes;
> 2. I inserted 32 strings into the chord network and got 32 corresponding
> keys;
> 3. I killed one of the lsd processes;
> 4. I retrieved the 32 keys.
>
> >From the explanation in your sigcomm paper, I expected about half of the 32
> key
> retrievals should fail because of the key lost as the result of my killing
> one
> lsd process. But my result is all the retrievals succeed in getting the
> corresponding strings. I am very surprised at this result. The logs of the
> live
> chord node for the retrieval of key 9f8ff2967cff70d7373cb304807e053a44d8a047
> are:
>
> ...
> lsd: will order successors 1
> lsd: dhash_download failed: 9f8ff2967cff70d7373cb304807e053a44d8a047:0:1:
> DHASH_NOENT at 3eb6ccaf98300a77989e0059cbe9a465bbc6f35d
> lsd: dhash_download failed: 9f8ff2967cff70d7373cb304807e053a44d8a047:0:1:
> DHASH_NOENT at 40f746245f6c3bb924ad6e5ddcf15811ea456e9a
> lsd: dhash_download failed: 9f8ff2967cff70d7373cb304807e053a44d8a047:0:1:
> DHASH_NOENT at 1ac46b2343075c88e9c0f601e96aa64b3651d8d7
> lsd: dhash_download failed: 9f8ff2967cff70d7373cb304807e053a44d8a047:0:1:
> DHASH_NOENT at 48896252b67623a835fb117b6559c70eeb583fc7
> lsd: dhash_download failed: 9f8ff2967cff70d7373cb304807e053a44d8a047:0:1:
> DHASH_NOENT at 15282e83543d99cbf76669c03b4d26b0e15b96da
> lsd: dhash_download failed: 9f8ff2967cff70d7373cb304807e053a44d8a047:0:1:
> DHASH_NOENT at 6793bc04e0c544d5c23e84c6c0605cd0998ee2b7
> lsd: dhash_download failed: 9f8ff2967cff70d7373cb304807e053a44d8a047:0:1:
> DHASH_NOENT at 766a83e4c5a2f0e76723006c6dad002de53aeaf9
>
> It seems all dhash_download are failed. But why it still can get the correct
> string of the key? I am very confused and I would be very appreciated for any
> ideas. Thanks a lot!
>
> Thanks!
> Yanyan :)
>
>
> =================================
> Yanyan Wang
> Department of Computer Science
> University of Colorado at Boulder
> Boulder, CO, 80302
> =================================
>
> _______________________________________________
> chord mailing list
> chord at amsterdam.lcs.mit.edu
> https://amsterdam.lcs.mit.edu/mailman/listinfo/chord
>


Yanyan :)


=================================
Yanyan Wang
Department of Computer Science
University of Colorado at Boulder
Boulder, CO, 80302
=================================



More information about the chord mailing list