[chord] Bug in Chord/lsd

Emil Sit sit at MIT.EDU
Wed Feb 24 22:03:00 EST 2010


Hi,

On Tue, 23 February 2010 at 20:03 (-0500), Stanislav Funiak wrote:
> reasons). Any idea of what is happening? I can send you a larger fragment of
> the log if it helps. I am running the latest snapshot of Chord from 05 Apr
> 2008 and the latest sfslite-0.8 from their repository.

It looks roughly like this node thinks no other nodes are online.  I am
not clear why this might happen; this sounds vaguely familiar though.
My memory is telling me that if you run strace or something you'd find
that the socket that lsd is trying to write to has been closed for some
reason and we failed to notice it.

You can try and run something like lsdctl loctab to
dump the node's location table and see if indeed it thinks
everything is dead.

Maybe hunt down where lsd calls write(2) or send(2), which
may be in sfslite, and see if it is checking the return
value and errno.

-- 
Emil Sit / MIT CSAIL PDOS / http://pdos.csail.mit.edu/chord/  



More information about the chord mailing list