[Click] Causing Kernel Panics Via Click
dmoore7@nd.edu
dmoore7 at nd.edu
Thu Sep 13 13:30:33 EDT 2007
Hello Everyone,
I have encounted a slightly unreliable method of bringing down one of my click
machines. I shall present an example below of how it happens.
> Log into to freshly booted machine
> Succesfully load locally stored click file
> Ping another machine via the click router
> Use w3c's webbot to successfully download a webpage from another host
> Do so again, again with proper results
> Execute click-uninstall
> Reload the same kernel config I had loaded earlier
> Meanwhile, in another ssh window to the webserver I have tcpdump running, and
have been watching the traffic coming in and out.
> I execute the same webbot command as previously, with the following output
(copy-pasted from the uninstall onward)
[root at filius ~]# click-uninstall [root at filius ~]# click-install
./Filius-1-50.click
[root at filius ~]# webbot -I spoof_eth2_0 -q -n -saveimg
http://192.168.10.4/index3.html
Message from syslogd at filius at Thu Sep 13 12:59:52 2007 ...
filius kernel: BUG: spinlock wrong CPU on CPU#2, click-uninstall/4890
Message from syslogd at filius at Thu Sep 13 12:59:52 2007 ...
filius kernel: lock: f8c79c44, .magic: dead4ead, .owner: click-uninstall/4890,
.owner_cpu: 3
<after about 5 minutes>
Read from remote host filius.cse.nd.edu: Connection timed out
Connection to filius.cse.nd.edu closed.
> During this I have been watching tcpdump on the http server, and it is
proceeding until suddenly it stops receiving ack's.
> Thus it appears the click module began malfunctioning mid-use, and not simply
upon its loading.
Background information:
- There are 3 machines with click loaded on them, the config files I used can
be found here:
Filius (the one that crashed): http://cse.nd.edu/~dmoore7/Filius-1-50.click
Sybill (the forwarder): http://cse.nd.edu/~dmoore7/Sybill-1-50.click
Hagrid (the http host): http://cse.nd.edu/~dmoore7/Hagrid-1-50.click
- For a general idea of what these router configs do, see this diagram:
http://cse.nd.edu/~dmoore7/myrouter.jpg
- The webbot I used is a slightly hacked version of w3c's webbot (modified to
allow forcing of a particular device). It exhibits no failures without click.
- Filius is running a version of click downloaded about 3 weeks ago from the
cvs. The others are running 1.5.0 downloaded from the click website during the
middle of this past summer. I can update filius if that may be helpful.
- The systems in question have 4 processors, I believe technically 2 core duo's
each.
- Kernel is a vanilla 2.6.16.13 from kernel.org w/ click patch applied
- This appears to be related to an earlier problem I reported here, which at
first appeared to be resolved:
https://pdos.csail.mit.edu/pipermail/click/2007-August/006206.html
Any input is appreciated, I will try to get crash dumps and such when I regain
access to the machine, as it is not local I must wait for someone else to
reboot it for me.
Thanks,
- David
More information about the click
mailing list