[Click] Causing Kernel Panics Via Click

Eddie Kohler kohler at cs.ucla.edu
Fri Sep 14 02:11:02 EDT 2007


It would certainly be helpful if you could upgrade filius to current CVS. 
Much spinlock stuff has happened there recently.

E


dmoore7 at nd.edu wrote:
> Hello Everyone,
> I have encounted a slightly unreliable method of bringing down one of my click
> machines.  I shall present an example below of how it happens.
> 
>> Log into to freshly booted machine
>> Succesfully load locally stored click file
>> Ping another machine via the click router
>> Use w3c's webbot to successfully download a webpage from another host
>> Do so again, again with proper results
>> Execute click-uninstall
>> Reload the same kernel config I had loaded earlier
> 
>> Meanwhile, in another ssh window to the webserver I have tcpdump running, and
> have been watching the traffic coming in and out.
> 
>> I execute the same webbot command as previously, with the following output
> (copy-pasted from the uninstall onward)
> 
> [root at filius ~]# click-uninstall [root at filius ~]# click-install
> ./Filius-1-50.click
> [root at filius ~]# webbot -I spoof_eth2_0 -q -n -saveimg
> http://192.168.10.4/index3.html
> Message from syslogd at filius at Thu Sep 13 12:59:52 2007 ...
> filius kernel: BUG: spinlock wrong CPU on CPU#2, click-uninstall/4890
> 
> Message from syslogd at filius at Thu Sep 13 12:59:52 2007 ...
> filius kernel:  lock: f8c79c44, .magic: dead4ead, .owner: click-uninstall/4890,
> .owner_cpu: 3
> 
> <after about 5 minutes>
> Read from remote host filius.cse.nd.edu: Connection timed out
> Connection to filius.cse.nd.edu closed.
> 
>> During this I have been watching tcpdump on the http server, and it is
> proceeding until suddenly it stops receiving ack's.
>> Thus it appears the click module began malfunctioning mid-use, and not simply
> upon its loading.
> 
> Background information:
>  - There are 3 machines with click loaded on them, the config files I used can
> be found here:
> Filius (the one that crashed): http://cse.nd.edu/~dmoore7/Filius-1-50.click
> Sybill (the forwarder): http://cse.nd.edu/~dmoore7/Sybill-1-50.click
> Hagrid (the http host): http://cse.nd.edu/~dmoore7/Hagrid-1-50.click
>  - For a general idea of what these router configs do, see this diagram:
> http://cse.nd.edu/~dmoore7/myrouter.jpg
>  - The webbot I used is a slightly hacked version of w3c's webbot (modified to
> allow forcing of a particular device).  It exhibits no failures without click.
>  - Filius is running a version of click downloaded about 3 weeks ago from the
> cvs.  The others are running 1.5.0 downloaded from the click website during the
> middle of this past summer.  I can update filius if that may be helpful.
>  - The systems in question have 4 processors, I believe technically 2 core duo's
> each.
>  - Kernel is a vanilla 2.6.16.13 from kernel.org w/ click patch applied
>  - This appears to be related to an earlier problem I reported here, which at
> first appeared to be resolved: 
> https://pdos.csail.mit.edu/pipermail/click/2007-August/006206.html
> 
> Any input is appreciated, I will try to get crash dumps and such when I regain
> access to the machine, as it is not local I must wait for someone else to
> reboot it for me.
> Thanks,
>  - David
> 
> 
> _______________________________________________
> click mailing list
> click at amsterdam.lcs.mit.edu
> https://amsterdam.lcs.mit.edu/mailman/listinfo/click


More information about the click mailing list