[Click] segfault testing nsclick 2.30

John McKendry john.mckendry at gmail.com
Sat Sep 27 20:38:54 EDT 2008


 I have built nsclick 2.30 using the Click 1.60 distribution,
following the directions at http://read.cs.ucla.edu/click/nsclick,
and I'm getting a segfault when I try to run the test procedure
in tcl/ex/nsclick-simple-wlan.tcl. gdb backtrace:

Program received signal SIGSEGV, Segmentation fault.
0x0000000000414066 in Packet::access ()
Current language:  auto; currently asm
(gdb) bt
#0  0x0000000000414066 in Packet::access ()
#1  0x000000000041ca68 in Classifier::classify ()
#2  0x00007fb485ac119f in Router::initialize (this=0xdfdc00, errh=0xde1cf0)
    at ../lib/router.cc:887
#3  0x00007fb485c08d7b in simclick_click_create (simnode=0xdbf350,
    router_file=0xdb3560 "nsclick-simple-lan.click") at nsclick.cc:110
#4  0x0000000000424e63 in ClickClassifier::command ()
#5  0x0000000000648251 in TclClass::dispatch_cmd ()
#6  0x000000000064da5c in OTclDispatch (cd=<value optimized out>,
in=0xad1db0, argc=4,
    argv=0x7fff8e0c64e0) at otcl.c:434
#7  0x0000000000654a37 in TclInvokeStringCommand ()
#8  0x000000000065631e in TclEvalObjvInternal ()
#9  0x000000000067d822 in TclExecuteByteCode ()
#10 0x0000000000680fde in TclCompEvalObj ()
#11 0x000000000067d724 in TclExecuteByteCode ()
#12 0x0000000000680fde in TclCompEvalObj ()
#13 0x00000000006a5486 in TclObjInterpProc ()
#14 0x00000000006a5820 in TclProcInterpProc ()
#15 0x000000000064dbc9 in OTclDispatch (cd=<value optimized out>,
in=0xad1db0, argc=3,
    argv=0x7fff8e0c7470) at otcl.c:477
#16 0x0000000000654a37 in TclInvokeStringCommand ()
#17 0x000000000065631e in TclEvalObjvInternal ()
#18 0x000000000067d822 in TclExecuteByteCode ()
---Type <return> to continue, or q <return> to quit---
#19 0x0000000000680fde in TclCompEvalObj ()
#20 0x000000000065726b in Tcl_EvalObjEx ()
#21 0x000000000065cbe9 in Tcl_ForObjCmd ()
#22 0x000000000065631e in TclEvalObjvInternal ()
#23 0x00000000006567e9 in Tcl_EvalEx ()
#24 0x0000000000696de7 in Tcl_FSEvalFile ()
#25 0x0000000000699fa0 in Tcl_Main ()
#26 0x000000000040800a in main ()
(gdb)

 Some details: I'm investigating network coding using code from
the COPE and MORE projects from Dina Katabi's lab at MIT:
http://piper.csail.mit.edu/dokuwiki/doku.php?id=cope and
http://people.csail.mit.edu/szym/more/README.html .
Both of these are built over Roofnet. I use the Roofnet code
from the Click 1.6.0 packages distribution. MORE is already
compatible with Click 1.60, but COPE dates from the 1.4.2 days
and I had to do some code-twiddling to get it to work in 1.6.0.
One of the things I did was to disable the Click:: namespace
stuff. Everything seems to work OK in Click, so far as I've
tested it. This is all userlevel Click, no linuxmodule.

 I've built nsclick on both a 32-bit and a 64-bit system, both
running Ubuntu Hardy, and I get the same segfault on both.

 The instruction that fails seems to be in Packet::access()
from ns-2.30/classifier/classifier.cc:
int Classifier::classify(Packet *p)
{
	return (mshift(*((int*) p->access(offset_))));
}

which is (in ns-2.30/common/packet.h)
	inline unsigned char* access(int off) const {
		if (off < 0)
			abort();
		return (&bits_[off]);
	}

 I can't tell what the values of "offset_" and &bits"
should look like, and gdb only shows me the assembler
for these functions, so it's slow going, and time to
ask for help. The 64-bit generated code is this:
Dump of assembler code for function _ZNK6Packet6accessEi:
0x0000000000414048 <_ZNK6Packet6accessEi+0>:	push   %rbp
0x0000000000414049 <_ZNK6Packet6accessEi+1>:	mov    %rsp,%rbp
0x000000000041404c <_ZNK6Packet6accessEi+4>:	sub    $0x10,%rsp
0x0000000000414050 <_ZNK6Packet6accessEi+8>:	mov    %rdi,-0x8(%rbp)
0x0000000000414054 <_ZNK6Packet6accessEi+12>:	mov    %esi,-0xc(%rbp)
0x0000000000414057 <_ZNK6Packet6accessEi+15>:	cmpl   $0x0,-0xc(%rbp)
0x000000000041405b <_ZNK6Packet6accessEi+19>:	jns    0x414062
<_ZNK6Packet6accessEi+26>
0x000000000041405d <_ZNK6Packet6accessEi+21>:	callq  0x407ec8 <abort>
0x0000000000414062 <_ZNK6Packet6accessEi+26>:	mov    -0x8(%rbp),%rax
0x0000000000414066 <_ZNK6Packet6accessEi+30>:	mov    0x28(%rax),%rdx
0x000000000041406a <_ZNK6Packet6accessEi+34>:	mov    -0xc(%rbp),%eax
0x000000000041406d <_ZNK6Packet6accessEi+37>:	cltq
0x000000000041406f <_ZNK6Packet6accessEi+39>:	lea    (%rdx,%rax,1),%rax
0x0000000000414073 <_ZNK6Packet6accessEi+43>:	leaveq
0x0000000000414074 <_ZNK6Packet6accessEi+44>:	retq
End of assembler dump.

 The segfault happens at 414066, mov 0x28(%rax),%rdx, because %rax is 29 (dec)
at this point. This 29 is passed in in %rdi, which they tell me is used for
argument 0 in X86-64, so this is probably what Classifier::classify() is
passing as "offset_".

 I put some puts trace statements in nsclick-simple-wlan.tcl, and this
is happening on the first call to
    [$node_($i) entry] loadclick "nsclick-simple-lan.click" (l. 206
in mine, probably around l. 200 in the original).

 Other than changing CLICK_DECL to /* */ and so forth to disable
the Click:: namespace, I didn't do anything creative to the Click
code - I did not redefine any packet formats or anything like that.

 I used Click 1.6.0 because I thought it would be better not to have
code changing underneath me while I tried to update COPE, and I used
ns 2.30 because I don't have patches for anything more recent. Click is
configured with --enable-wifi --enable-roofnet --enable-userlevel
--enable-nsclick.

 Thanks for any help anyone can provide, of course.

John


More information about the click mailing list