[chord] Bug in Chord/lsd

Stanislav Funiak sfuniak at cs.cmu.edu
Tue Feb 23 20:03:50 EST 2010


Hi,

I am running into problems when using Chord/lsd. Occasionally, lsd goes into
a kind of an infinite loop, hogging CPU and memory, and spewing out an
inordinate amount of logs (1GB or more). I am including a part of the log
below (I replaced the port that I am running on with XXX for security
reasons). Any idea of what is happening? I can send you a larger fragment of
the log if it helps. I am running the latest snapshot of Chord from 05 Apr
2008 and the latest sfslite-0.8 from their repository.

Thanks,
Stanislav


...

1266941344:252052 REXMIT c79edd65 344447:7 rexmits 0, timeout 1000 ms,
destined for 131.247.2.248 out is 138330852
1266941344:266795 REXMIT 7d449b83 344447:7 rexmits 1, timeout 2000 ms,
destined for 131.247.2.248 out is 138292116
1266941344:270983 REXMIT 1041c253 344447:7 rexmits 1, timeout 2000 ms,
destined for 131.247.2.248 out is 138293428
1266941344:275971 REXMIT b02f158 344447:7 rexmits 1, timeout 2000 ms,
destined for 202.112.28.100 out is 138288180
1266941344:289118 REXMIT 3c04eed3 344447:7 rexmits 0, timeout 1000 ms,
destined for 128.220.231.4 out is 138342564
1266941344:294058 REXMIT 3b623c2b 344447:7 rexmits 0, timeout 1000 ms,
destined for 131.247.2.248 out is 138334788
1266941344:315960 RPC failure: RPC: Timed out destined for
1d0e10a86e1855655eaa3cc3cebc53956190ab8 at 202.112.28.100 seqno 149 out
138191580
alert: 1d0e10a86e1855655eaa3cc3cebc53956190ab8 died; notify
e14de2d0ef840f487677bad851f5bc3ba23599ba
lsd: e14de2d0ef840f487677bad851f5bc3ba23599ba:
1d0e10a86e1855655eaa3cc3cebc53956190ab8 is down.  Now trying
e14de2d0ef840f487677bad851f5bc3ba23599ba
 1266941344:316181 RPC failure: RPC: Timed out destined for
1d0e10a86e1855655eaa3cc3cebc53956190ab8 at 202.112.28.100 seqno 153 out
138200828
alert: 1d0e10a86e1855655eaa3cc3cebc53956190ab8 died; notify
e14de2d0ef840f487677bad851f5bc3ba23599ba
lsd: e14de2d0ef840f487677bad851f5bc3ba23599ba:
1d0e10a86e1855655eaa3cc3cebc53956190ab8 is down.  Now trying
e14de2d0ef840f487677bad851f5bc3ba23599ba
 1266941344:316330 RPC failure: RPC: Timed out destined for
1d0e10a86e1855655eaa3cc3cebc53956190ab8 at 202.112.28.100 seqno 159 out
138203644
alert: 1d0e10a86e1855655eaa3cc3cebc53956190ab8 died; notify
e14de2d0ef840f487677bad851f5bc3ba23599ba
lsd: e14de2d0ef840f487677bad851f5bc3ba23599ba:
1d0e10a86e1855655eaa3cc3cebc53956190ab8 is down.  Now trying
e14de2d0ef840f487677bad851f5bc3ba23599ba
 1266941344:316506 RPC failure: RPC: Timed out destined for
1d0e10a86e1855655eaa3cc3cebc53956190ab8 at 202.112.28.100 seqno 160 out
138206252
alert: 1d0e10a86e1855655eaa3cc3cebc53956190ab8 died; notify
e14de2d0ef840f487677bad851f5bc3ba23599ba
lsd: e14de2d0ef840f487677bad851f5bc3ba23599ba:
1d0e10a86e1855655eaa3cc3cebc53956190ab8 is down.  Now trying
e14de2d0ef840f487677bad851f5bc3ba23599ba
 1266941344:316657 RPC failure: RPC: Timed out destined for
1d0e10a86e1855655eaa3cc3cebc53956190ab8 at 202.112.28.100 seqno 163 out
138210188
alert: 1d0e10a86e1855655eaa3cc3cebc53956190ab8 died; notify
e14de2d0ef840f487677bad851f5bc3ba23599ba

...

lsd: e14de2d0ef840f487677bad851f5bc3ba23599ba:
8688fbde67d1c8117c363195655689af1e7e6f27 is down.  Now trying
e14de2d0ef840f487677bad851f5bc3ba23599ba
 1266941344:577310 RPC failure: RPC: Timed out destined for
8688fbde67d1c8117c363195655689af1e7e6f27 at 131.247.2.248 seqno 207 out
138268076
alert: 8688fbde67d1c8117c363195655689af1e7e6f27 died; notify
e14de2d0ef840f487677bad851f5bc3ba23599ba
lsd: e14de2d0ef840f487677bad851f5bc3ba23599ba:
8688fbde67d1c8117c363195655689af1e7e6f27 is down.  Now trying
e14de2d0ef840f487677bad851f5bc3ba23599ba
 1266941344:577412 RPC failure: RPC: Timed out destined for
8688fbde67d1c8117c363195655689af1e7e6f27 at 131.247.2.248 seqno 209 out
138269388
alert: 8688fbde67d1c8117c363195655689af1e7e6f27 died; notify
e14de2d0ef840f487677bad851f5bc3ba23599ba
lsd: e14de2d0ef840f487677bad851f5bc3ba23599ba:
8688fbde67d1c8117c363195655689af1e7e6f27 is down.  Now trying
e14de2d0ef840f487677bad851f5bc3ba23599ba

...

lsd: e14de2d0ef840f487677bad851f5bc3ba23599ba: doRPC (chord_program_1.1) on
dead node 131.247.2.248:XXX
doalert_cb: 8688fbde67d1c8117c363195655689af1e7e6f27 is indeed not alive
 1266941345:036888 REXMIT 2b58aa1f 344447:7 rexmits 0, timeout 1000 ms,
destined for 132.239.17.225 out is 138401412
 1266941345:142508 REXMIT d0f8400c 344447:7 rexmits 0, timeout 1000 ms,
destined for 130.92.70.254 out is 139348876
 1266941345:317278 REXMIT e4292863 344447:5 rexmits 0, timeout 1000 ms,
destined for 64.161.10.2 out is 139193692

...

doalert_cb: 1d0e10a86e1855655eaa3cc3cebc53956190ab8 is indeed not alive
doalert_cb: 1d0e10a86e1855655eaa3cc3cebc53956190ab8 is indeed not alive
doalert_cb: 1d0e10a86e1855655eaa3cc3cebc53956190ab8 is indeed not alive
doalert_cb: 1d0e10a86e1855655eaa3cc3cebc53956190ab8 is indeed not alive
doalert_cb: 1d0e10a86e1855655eaa3cc3cebc53956190ab8 is indeed not alive
 1266941345:320758 REXMIT 26f54f53 344447:7 rexmits 0, timeout 1000 ms,
destined
 for 160.36.57.173 out is 138535716
 1266941345:321765 REXMIT 25cddc09 344447:7 rexmits 0, timeout 1000 ms,
destined for 128.220.231.4 out is 138433652
 1266941345:323799 REXMIT f16817e3 344447:5 rexmits 0, timeout 1000 ms,
destined for 64.161.10.2 out is 139376604
 1266941345:323935 REXMIT cbd7895a 344447:5 rexmits 0, timeout 1000 ms,
destined for 64.161.10.2 out is 138114716
lsd: e14de2d0ef840f487677bad851f5bc3ba23599ba: doRPC (chord_program_1.1) on
dead node 202.112.28.100:XXX
lsd: e14de2d0ef840f487677bad851f5bc3ba23599ba: doRPC (chord_program_1.1) on
dead node 202.112.28.100:XXX
 1266941345:324201 REXMIT eb9251b5 344447:5 rexmits 0, timeout 1000 ms,
destined for 64.161.10.2 out is 139417796
doalert_cb: 1d0e10a86e1855655eaa3cc3cebc53956190ab8 is indeed not alive
doalert_cb: 1d0e10a86e1855655eaa3cc3cebc53956190ab8 is indeed not alive
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://amsterdam.lcs.mit.edu/pipermail/chord/attachments/20100223/d6cba79a/attachment.htm 


More information about the chord mailing list