[chord] Problem with dhashcli

Katarzyna Stefanowicz kate_stefik at tlen.pl
Sun Feb 10 01:38:43 EST 2008


Emil Sit wrote:
> Czesc Katarzyna,

Hello Emil,

do you know Polish?

> You should be able to mirror the code in dhashclient::insert
> below in your publish code to deal with this. 
> 
> Something like (untested)...
> 
> void
> incognito_impl::publish (const incognito_store_arg &req)
> {
>     str data = str (req.data.base (), req.data.size ());
>     if (req.key_type == DHASH_NOAUTH)
>       data = dhblock_noauth::marshal_block (data);
>     ref<dhash_block> = New refcounted<dhash_block> (data, req.key_type);
> 
>     ...
> 
> Does that help?

Thank you very much for explanation and advice. It was very helpful.
I did as you suggested but unfortunately some think is still wrong.

My current publish function looks like this:
> void incognito_impl::publish(const incognito_store_arg& req)
> {
>         str data = str (req.data.base (), req.data.size ());
>         str marshalled_data;
>         if (req.key_type == DHASH_NOAUTH)
>                 marshalled_data = dhblock_noauth::marshal_block (data);
>         else if (req.key_type == DHASH_CONTENTHASH)
>                 marshalled_data = dhblock_chash::marshal_block (data);
>         else {
>                 warnx << "incognito_impl::publish: unknown key_type: "
>                         << " FIXME: " << __FILE__ << ":" << __LINE__ << "\n";
>                 return;
>         }
>         
> 
>         ref<dhash_block> block = New refcounted<dhash_block> (marshalled_data, req.key_type);
>         block->ID = req.key_value;
> 
>         _dhcli->insert (block,
>                         wrap (mkref (this), &incognito_impl::publish_insert_cb));
> }

I start my network. At the beginning noauth block db at node
e2bb6d6101eff02a4a1eca3e578800732999a561 (will be needed later) is
empty:
> chordtest at test2:/tmp/dhash-test2-c$ /tmp/dbdump -t
> db/e2bb6d6101eff02a4a1eca3e578800732999a561.n
> EOF.
> total keys: 0
> total bytes: 0

Then I insert some noauth block with random id (here:
b444ac06613fc8d63795be9ad0beaf5500000000). It works:
> 1202621799.034331 dhashcli: store b444ac06613fc8d63795be9ad0beaf5500000000 (frag 2/5) -> e2bb6d6101eff02a4a1eca3e578800732999a561 in 3ms: DHASH_OK
> 1202621799.035274 dhashcli: store b444ac06613fc8d63795be9ad0beaf5500000000 (frag 1/5) -> bc2cd702b77a6da8b3d7b254ae38450efe760bd2 in 4ms: DHASH_OK
> 1202621799.036149 dhashcli: store b444ac06613fc8d63795be9ad0beaf5500000000 (frag 4/5) -> ed2b9e87da6e557d4e4c048995d7ac61eb2481bb in 5ms: DHASH_OK
> 1202621799.038140 dhashcli: store b444ac06613fc8d63795be9ad0beaf5500000000 (frag 3/5) -> ea182e8573df8cf54320288a8f70e26bf4cb4464 in 7ms: DHASH_OK
> 1202621799.038236 dhashcli: store b444ac06613fc8d63795be9ad0beaf5500000000 (frag 5/5) -> fb5fa23d77f54f3679d948430bcda824abd48a60 in 7ms: DHASH_OK

I'm able to read contents of this block.

Then I check the same db:
> chordtest at test2:/tmp/dhash-test2-c$ /tmp/dbdump -t
> db/e2bb6d6101eff02a4a1eca3e578800732999a561.n
> key[1] b444ac06613fc8d63795be9ad0beaf5500000000 16 0
> EOF.
> total keys: 1
> total bytes: 16

But when I try to insert the same block again I get:
> 1202621838.928104 dhashcli: store b444ac06613fc8d63795be9ad0beaf5500000000 (frag 2/5) -> e2bb6d6101eff02a4a1eca3e578800732999a561 in 1ms: DHASH_STALE
> 1202621838.928321 dhashcli: store b444ac06613fc8d63795be9ad0beaf5500000000 (frag 1/5) -> bc2cd702b77a6da8b3d7b254ae38450efe760bd2 in 1ms: DHASH_STALE
> 1202621838.929519 dhashcli: store b444ac06613fc8d63795be9ad0beaf5500000000 (frag 4/5) -> ed2b9e87da6e557d4e4c048995d7ac61eb2481bb in 3ms: DHASH_STALE
> 1202621838.929631 dhashcli: store b444ac06613fc8d63795be9ad0beaf5500000000 (frag 3/5) -> ea182e8573df8cf54320288a8f70e26bf4cb4464 in 3ms: DHASH_STALE
> 1202621838.929683 dhashcli: store b444ac06613fc8d63795be9ad0beaf5500000000 (frag 5/5) -> fb5fa23d77f54f3679d948430bcda824abd48a60 in 3ms: DHASH_STALE
> 1202621838.929695 dhashcli: ad91f3d2b200dba01e82a43a4dad65a76a98570f: store (b444ac06613fc8d63795be9ad0beaf5500000000): only stored 0 of 5 encoded.
> 1202621838.929706 dhashcli: ad91f3d2b200dba01e82a43a4dad65a76a98570f: store (b444ac06613fc8d63795be9ad0beaf5500000000): failed; insufficient frags/blocks stored.

And worse - when I try to insert different content, all target nodes
crash:
>  1202621856:787429 RPC failure: RPC: Timed out destined for bc2cd702b77a6da8b3d7b254ae38450efe760bd2 at 10.14.5.6 seqno 255 out 138333412
> 1202621856.787515 dhashcli: store b444ac06613fc8d63795be9ad0beaf5500000000 (frag 1/5) -> bc2cd702b77a6da8b3d7b254ae38450efe760bd2 in 2903ms: DHASH_RPCERR
>  1202621856:791351 RPC failure: RPC: Timed out destined for ed2b9e87da6e557d4e4c048995d7ac61eb2481bb at 10.14.5.4 seqno 258 out 138321228
> 1202621856.791382 dhashcli: store b444ac06613fc8d63795be9ad0beaf5500000000 (frag 4/5) -> ed2b9e87da6e557d4e4c048995d7ac61eb2481bb in 2907ms: DHASH_RPCERR
>  1202621856:799349 RPC failure: RPC: Timed out destined for ea182e8573df8cf54320288a8f70e26bf4cb4464 at 10.14.5.2 seqno 257 out 138307420
> 1202621856.799379 dhashcli: store b444ac06613fc8d63795be9ad0beaf5500000000 (frag 3/5) -> ea182e8573df8cf54320288a8f70e26bf4cb4464 in 2915ms: DHASH_RPCERR
>  1202621856:799404 RPC failure: RPC: Timed out destined for e2bb6d6101eff02a4a1eca3e578800732999a561 at 10.14.5.2 seqno 256 out 138327772
> 1202621856.799433 dhashcli: store b444ac06613fc8d63795be9ad0beaf5500000000 (frag 2/5) -> e2bb6d6101eff02a4a1eca3e578800732999a561 in 2915ms: DHASH_RPCERR
>  1202621856:803358 RPC failure: RPC: Timed out destined for fb5fa23d77f54f3679d948430bcda824abd48a60 at 10.14.5.2 seqno 259 out 138338956
> 1202621856.803393 dhashcli: store b444ac06613fc8d63795be9ad0beaf5500000000 (frag 5/5) -> fb5fa23d77f54f3679d948430bcda824abd48a60 in 2919ms: DHASH_RPCERR
> 1202621856.803407 dhashcli: ad91f3d2b200dba01e82a43a4dad65a76a98570f: store (b444ac06613fc8d63795be9ad0beaf5500000000): only stored 0 of 5 encoded.
> 1202621856.803418 dhashcli: ad91f3d2b200dba01e82a43a4dad65a76a98570f: store (b444ac06613fc8d63795be9ad0beaf5500000000): failed; insufficient frags/blocks stored.

On dead node I can see (first line after first insert):
> lsd: e2bb6d6101eff02a4a1eca3e578800732999a561 db write: U b444ac06613fc8d63795be9ad0beaf5500000000 16
> lsd: ../../../chord-0.1/dhash/dhblock_noauth_srv.C:101: void dhblock_noauth_srv::after_delete(chordID, str, u_int32_t, cb_dhstat, adb_status): Assertion `err == ADB_OK' failed.

I've run lsd under gdb, and it looks like err=ADB_NOTFOUND:
> (gdb) bt
> #0  0xb7f04402 in __kernel_vsyscall ()
> #1  0xb7b759d1 in raise () from /lib/tls/i686/cmov/libc.so.6
> #2  0xb7b77219 in abort () from /lib/tls/i686/cmov/libc.so.6
> #3  0xb7b6f0df in __assert_fail () from /lib/tls/i686/cmov/libc.so.6
> #4  0x080bb53a in dhblock_noauth_srv::after_delete (this=0x83de598, key=@0xbfeae374, data=@0xbfeae388, exp=0, cb=@0xbfeae380, err=ADB_NOTFOUND)
>     at ../../../chord-0.1/dhash/dhblock_noauth_srv.C:101
> #5  0x080bdc88 in callback_c_1_4<dhblock_noauth_srv*, dhblock_noauth_srv, void, adb_status, bigint, str, unsigned int, ptr<callback<void, dhash_stat, void, void> > >::opera
> tor() (this=0x83e0d88, b1=ADB_NOTFOUND) at /home/maya/incognito/src/build/chord/../sfslite/../../sfslite-0.8.16/async/callback1.h:2183
> #6  0x08148629 in adb::generic_cb (this=0x83de64c, res=0x83e04a8, cb=@0xbfeae3dc, err=RPC_SUCCESS)
>     at /home/maya/incognito/src/build/chord/../sfslite/../../sfslite-0.8.16/async/callback1.h:4198
> #7  0x0814c434 in callback_c_1_2<adb*, adb, void, clnt_stat, adb_status*, ptr<callback<void, adb_status, void, void> > >::operator() (this=0x1340, b1=RPC_SUCCESS)
>     at /home/maya/incognito/src/build/chord/../sfslite/../../sfslite-0.8.16/async/callback1.h:1890
> #8  0x0828bc05 in rpccb::finish (this=0x83e38f8, stat=RPC_SUCCESS) at ../../../sfslite-0.8.16/arpc/aclnt.C:139
> #9  0x0828db86 in aclnt::dispatch (xi=@0xbfeae534, msg=0xb798e00c "?\031mP", len=28, src=0x0) at ../../../sfslite-0.8.16/arpc/aclnt.C:610
> #10 0x0829fbd5 in xhinfo::dispatch (this=0x83de8a0, msg=0xb798e00c "?\031mP", len=<value optimized out>, src=0x0) at ../../../sfslite-0.8.16/arpc/xhinfo.C:88
> #11 0x082975ce in axprt_pipe::getpkt (this=0x83de6f8, cpp=0xbfeae5c8, eom=0xb798e028 "") at ../../../sfslite-0.8.16/arpc/axprt_pipe.C:302
> #12 0x08296abe in axprt_pipe::callgetpkt (this=0x83de6f8) at ../../../sfslite-0.8.16/arpc/axprt_pipe.C:361
> #13 0x08297431 in axprt_pipe::input (this=0x83de6f8) at ../../../sfslite-0.8.16/arpc/axprt_pipe.C:332
> #14 0x082a4a8a in fdcb_check () at ../../../sfslite-0.8.16/async/core.C:275
> #15 0x082a507d in amain () at ../../../sfslite-0.8.16/async/core.C:427
> #16 0x0804ff41 in main (argc=14, argv=0xbfeae8d4) at ../../../chord-0.1/lsd/lsd.C:786

Result of running dbdump is the same (1 entry). Do you have any ideas 
what may be wrong?

Best regards,
Katsiaryna Stsefanovich



More information about the chord mailing list