[Click] Causing Kernel Panics Via Click
Eddie Kohler
kohler at cs.ucla.edu
Fri Sep 14 02:11:02 EDT 2007
It would certainly be helpful if you could upgrade filius to current CVS.
Much spinlock stuff has happened there recently.
E
dmoore7 at nd.edu wrote:
> Hello Everyone,
> I have encounted a slightly unreliable method of bringing down one of my click
> machines. I shall present an example below of how it happens.
>
>> Log into to freshly booted machine
>> Succesfully load locally stored click file
>> Ping another machine via the click router
>> Use w3c's webbot to successfully download a webpage from another host
>> Do so again, again with proper results
>> Execute click-uninstall
>> Reload the same kernel config I had loaded earlier
>
>> Meanwhile, in another ssh window to the webserver I have tcpdump running, and
> have been watching the traffic coming in and out.
>
>> I execute the same webbot command as previously, with the following output
> (copy-pasted from the uninstall onward)
>
> [root at filius ~]# click-uninstall [root at filius ~]# click-install
> ./Filius-1-50.click
> [root at filius ~]# webbot -I spoof_eth2_0 -q -n -saveimg
> http://192.168.10.4/index3.html
> Message from syslogd at filius at Thu Sep 13 12:59:52 2007 ...
> filius kernel: BUG: spinlock wrong CPU on CPU#2, click-uninstall/4890
>
> Message from syslogd at filius at Thu Sep 13 12:59:52 2007 ...
> filius kernel: lock: f8c79c44, .magic: dead4ead, .owner: click-uninstall/4890,
> .owner_cpu: 3
>
> <after about 5 minutes>
> Read from remote host filius.cse.nd.edu: Connection timed out
> Connection to filius.cse.nd.edu closed.
>
>> During this I have been watching tcpdump on the http server, and it is
> proceeding until suddenly it stops receiving ack's.
>> Thus it appears the click module began malfunctioning mid-use, and not simply
> upon its loading.
>
> Background information:
> - There are 3 machines with click loaded on them, the config files I used can
> be found here:
> Filius (the one that crashed): http://cse.nd.edu/~dmoore7/Filius-1-50.click
> Sybill (the forwarder): http://cse.nd.edu/~dmoore7/Sybill-1-50.click
> Hagrid (the http host): http://cse.nd.edu/~dmoore7/Hagrid-1-50.click
> - For a general idea of what these router configs do, see this diagram:
> http://cse.nd.edu/~dmoore7/myrouter.jpg
> - The webbot I used is a slightly hacked version of w3c's webbot (modified to
> allow forcing of a particular device). It exhibits no failures without click.
> - Filius is running a version of click downloaded about 3 weeks ago from the
> cvs. The others are running 1.5.0 downloaded from the click website during the
> middle of this past summer. I can update filius if that may be helpful.
> - The systems in question have 4 processors, I believe technically 2 core duo's
> each.
> - Kernel is a vanilla 2.6.16.13 from kernel.org w/ click patch applied
> - This appears to be related to an earlier problem I reported here, which at
> first appeared to be resolved:
> https://pdos.csail.mit.edu/pipermail/click/2007-August/006206.html
>
> Any input is appreciated, I will try to get crash dumps and such when I regain
> access to the machine, as it is not local I must wait for someone else to
> reboot it for me.
> Thanks,
> - David
>
>
> _______________________________________________
> click mailing list
> click at amsterdam.lcs.mit.edu
> https://amsterdam.lcs.mit.edu/mailman/listinfo/click
More information about the click
mailing list