[chord] infinite retries on KEYHASH blocks

Michael Walfish mwalfish at lcs.mit.edu
Mon Jul 21 22:27:35 EDT 2003


Hello,

1) It appears, from the code in dhash/client.C, that the intended behavior 
on a DHASH_KEYHASH retrieve() request when the block is not found on 
the correct node is:
   
   --look for the block on each successor
   --if that fails, repeat looking for the block on each successor 
'retries' number of times (hardcoded to 5 in the code below, marked with 
**).

2) However, the code never checks the number of retries. I marked -------
below where it looks like this check should happen.

3) The current behavior is that the retries decrement ad infinitum. The 
parameter goes to -2, -3, etc. Confirmed in gdb. And of course doing  

   dhashclient->retrieve(non_existent_chordid, DHASH_KEYHASH, wrap(&cb))

inside 'typetest' results in the code never returning (you can see the
retries happening every 5 seconds forever in the output from lsd). This is
all with the stock code.


Was wondering:
a) if infinite retrying is intended behavior
b) if not, whether this will be fixed soon. I'd be happy to mail in a
patch. It's just a tiny code modification, of course:

 if (options & DHASHCLIENT_NO_RETRY_ON_LOOKUP) { 
               -->
 if (retries == 0 || options & DHASHCLIENT_NO_RETRY_ON_LOOKUP) {


Thanks,
Mike


--------------------------------------------------------------------
from dhash/client.C

void
dhashcli::retrieve (blockID blockID, cb_ret cb, int options, 
		    ptr<chordID> guess)
{
  . . . .
  if (blockID.ctype == DHASH_KEYHASH) {
    ci->first_hop (wrap (this, &dhashcli::retrieve_block_hop_cb, rs, ci,
			 options, **5**, guess),
		   guess);
  }
  . . . .  
}

void
dhashcli::retrieve_block_hop_cb (ptr<rcv_state> rs, route_iterator *ci,
				 int options, int retries, ptr<chordID> guess,
				 bool done)
{
 . . . .
  chord_node s = rs->succs.pop_front ();
  dhash_download::execute (clntnode, s, rs->key, NULL, 0, 0, 0,
			   wrap (this, &dhashcli::retrieve_dl_or_walk_cb,
				 rs, status, options, retries, guess));
}

void
dhashcli::retrieve_dl_or_walk_cb (ptr<rcv_state> rs, dhash_stat status,
				  int options, int retries, ptr<chordID> guess,
				  ptr<dhash_block> blk)
{
  chordID myID = clntnode->my_ID ();

  if(!blk) {
    if (options & DHASHCLIENT_NO_RETRY_ON_LOOKUP -------- ) {
      rs->complete (DHASH_NOENT, NULL);
      rs = NULL;
    } else if (rs->succs.size() == 0) {
      trace << myID << ": walk (" << rs->key << "): No luck walking successors, retrying..\n";
      route_iterator *ci = r_factory->produce_iterator_ptr (rs->key.ID);
      delaycb (5, wrap (ci, &route_iterator::first_hop, 
			wrap (this, &dhashcli::retrieve_block_hop_cb,
			      rs, ci, options, retries - 1, guess),
			guess));
    } else {
      chord_node s = rs->succs.pop_front ();
      dhash_download::execute (clntnode, s, rs->key, NULL, 0, 0, 0,
			       wrap (this, &dhashcli::retrieve_dl_or_walk_cb,
				     rs, status, options, retries,
				     guess));
    }
  } else {
    rs->timemark ();

    blk->ID = rs->key.ID;
    blk->hops = rs->r.size ();
    blk->errors = rs->nextsucc - dhash::num_dfrags ();
    blk->retries = blk->errors;

    rs->complete (DHASH_OK, blk);
    rs = NULL;
  }
}





More information about the chord mailing list