[Click] Causing Kernel Panics Via Click

Eddie Kohler kohler at cs.ucla.edu
Fri Sep 14 11:20:33 EDT 2007


There was probably a line just before the "Oops:" line...do you know what it was?


dmoore7 at nd.edu wrote:
> I grabbed the latest CVS and simply recompiled/installed it without any kernel
> modifications, but click crashed (without panicing the kernel).  I think this
> means I need to recompile the kernel, although I am unsure if I need to
> download vanilla sources and patch from there, or do it some other way.
> 
> I'll probably start fresh from a vanilla kernel after lunch, unless someone
> offers advice/admonition otherwise.
> Thanks
>  - David Moore
> 
> The crash:
> [root at filius ~]# click-install ./Filius-1-50.click
> Segmentation fault
> Message from syslogd at filius at Fri Sep 14 10:39:03 2007 ... <This line preceded
> all further lines, edited out for clarity>
> filius kernel: Oops: 0002 [#1]
> filius kernel: SMP
> filius kernel: CPU:    3
> filius kernel: EIP is at _ZN4Task17strong_rescheduleEv+0x12/0x1a0 [click]
> filius kernel: eax: f5cced14   ebx: 00000001   ecx: 00000000   edx: f7d36000
> filius kernel: esi: f5cced14   edi: f6333000   ebp: 00000000   esp: f6333cb8
> filius kernel: ds: 007b   es: 007b   ss: 0068
> filius kernel: Process click-install (pid: 3957, threadinfo=f6333000
> task=f7d36000)
> filius kernel: Stack: <0>f8affbaa 00000001 f5ccc800 f6333cd4 f8b005c7 f6333cd4
> 00ffffff f5c21a20
> filius kernel:        00000001 00000004 f5ccc800 f8c79ac8 f5ccc800 00000001
> c02e2946 f5ccc800
> filius kernel:        00000000 00001002 c028a70d f5ccc800 00001043 c028b96a
> f6333d2c 00000001
> filius kernel: Call Trace:
> filius kernel:  [<f8affbaa>] _ZN8ToDevice13change_deviceEP10net_device+0x2a/0x60
> [click]
> filius kernel:  [<f8b005c7>] device_notifier_hook+0x77/0x90 [click]
> filius kernel:  [<c02e2946>] notifier_call_chain+0x17/0x2e
> filius kernel:  [<c028a70d>] dev_open+0x66/0x6d
> filius kernel:  [<c028b96a>] dev_change_flags+0x48/0xed
> filius kernel:  [<c028c19f>] dev_ioctl+0x309/0x3e2
> filius kernel:  [<f8afc8a2>]
> _Z10dev_updownP10net_deviceiP12ErrorHandler+0x72/0x110 [click]
> filius kernel:  [<f8afc78c>]
> _ZN8FromHost20set_device_addressesEP12ErrorHandler+0xac/0x150 [click]
> filius kernel:  [<f8afca11>] _ZN8FromHost10initializeEP12ErrorHandler+0xd1/0xf0
> [click]
> filius kernel:  [<f8aaa5e6>] _Znaj+0x16/0x20 [click]
> filius kernel:  [<f8aca942>] _ZN6Router10initializeEP12ErrorHandler+0x682/0x780
> [click]
> filius kernel:  [<c013ec21>] __alloc_pages+0x59/0x273
> filius kernel:  [<f8b20111>]
> _Z12write_configRK6StringP7ElementPvP12ErrorHandler+0x121/0x1e0 [click]
> filius kernel:  [<f8ac456f>]
> _ZNK7Handler10call_writeERK6StringP7ElementbP12ErrorHandler+0x13f/0x210 [cli
> ck]
> filius kernel:  [<c01533b6>] cache_grow+0x128/0x14a
> filius kernel:  [<f8b23789>] handler_flush+0x499/0x590 [click]
> filius kernel:  [<f8aaa5e6>] _Znaj+0x16/0x20 [click]
> filius kernel:  [<c0155724>] filp_close+0x31/0x52
> filius kernel:  [<c01031ab>] sysenter_past_esp+0x54/0x75
> filius kernel: Code: ff eb d7 c7 04 24 b5 26 b3 f8 e8 0a f4 ff ff e9 60 ff ff ff
> 90 8d 74 26 00 57 bf 00 f0 ff ff 56 89 c6 53 83 ec 04 21 e7 8b 4e 1c <f0> ff 41
> 4c 8d 59 44 8b 47 10 ba 01 00 00 00 39 43 04 74 2 a 8d
> 
> Quoting Joonwoo Park <joonwpark81 at gmail.com>:
> 
>> Hi David,
>>
>> If you are using linux 2.6.16.13
>> Just doing update your click source to lastest revision from cvs or
>> git may can solve problem. (without update kernel and compile)
>>
>> Joonwoo Park
>>
>> 2007/9/14, dmoore7 at nd.edu <dmoore7 at nd.edu>:
>>> Thanks Eddie,
>>> I will do that.  I assume it will be necesary to repatch and recompile the
>>> kernel?
>>> Should I re-download the vanilla sources from kernel.org and patch from
>> those,
>>> or is it possible to patch the already patched ones I have now.
>>> Thanks,
>>>  - David
>>>
>>> Quoting Eddie Kohler <kohler at cs.ucla.edu>:
>>>
>>>> It would certainly be helpful if you could upgrade filius to current CVS.
>>>> Much spinlock stuff has happened there recently.
>>>>
>>>> E
>>>>
>>>>
>>>> dmoore7 at nd.edu wrote:
>>>>> Hello Everyone,
>>>>> I have encounted a slightly unreliable method of bringing down one of
>> my
>>>> click
>>>>> machines.  I shall present an example below of how it happens.
>>>>>
>>>>>> Log into to freshly booted machine
>>>>>> Succesfully load locally stored click file
>>>>>> Ping another machine via the click router
>>>>>> Use w3c's webbot to successfully download a webpage from another host
>>>>>> Do so again, again with proper results
>>>>>> Execute click-uninstall
>>>>>> Reload the same kernel config I had loaded earlier
>>>>>> Meanwhile, in another ssh window to the webserver I have tcpdump
>> running,
>>>> and
>>>>> have been watching the traffic coming in and out.
>>>>>
>>>>>> I execute the same webbot command as previously, with the following
>> output
>>>>> (copy-pasted from the uninstall onward)
>>>>>
>>>>> [root at filius ~]# click-uninstall [root at filius ~]# click-install
>>>>> ./Filius-1-50.click
>>>>> [root at filius ~]# webbot -I spoof_eth2_0 -q -n -saveimg
>>>>> http://192.168.10.4/index3.html
>>>>> Message from syslogd at filius at Thu Sep 13 12:59:52 2007 ...
>>>>> filius kernel: BUG: spinlock wrong CPU on CPU#2, click-uninstall/4890
>>>>>
>>>>> Message from syslogd at filius at Thu Sep 13 12:59:52 2007 ...
>>>>> filius kernel:  lock: f8c79c44, .magic: dead4ead, .owner:
>>>> click-uninstall/4890,
>>>>> .owner_cpu: 3
>>>>>
>>>>> <after about 5 minutes>
>>>>> Read from remote host filius.cse.nd.edu: Connection timed out
>>>>> Connection to filius.cse.nd.edu closed.
>>>>>
>>>>>> During this I have been watching tcpdump on the http server, and it is
>>>>> proceeding until suddenly it stops receiving ack's.
>>>>>> Thus it appears the click module began malfunctioning mid-use, and not
>>>> simply
>>>>> upon its loading.
>>>>>
>>>>> Background information:
>>>>>  - There are 3 machines with click loaded on them, the config files I
>> used
>>>> can
>>>>> be found here:
>>>>> Filius (the one that crashed):
>> http://cse.nd.edu/~dmoore7/Filius-1-50.click
>>>>> Sybill (the forwarder): http://cse.nd.edu/~dmoore7/Sybill-1-50.click
>>>>> Hagrid (the http host): http://cse.nd.edu/~dmoore7/Hagrid-1-50.click
>>>>>  - For a general idea of what these router configs do, see this
>> diagram:
>>>>> http://cse.nd.edu/~dmoore7/myrouter.jpg
>>>>>  - The webbot I used is a slightly hacked version of w3c's webbot
>> (modified
>>>> to
>>>>> allow forcing of a particular device).  It exhibits no failures without
>>>> click.
>>>>>  - Filius is running a version of click downloaded about 3 weeks ago
>> from
>>>> the
>>>>> cvs.  The others are running 1.5.0 downloaded from the click website
>> during
>>>> the
>>>>> middle of this past summer.  I can update filius if that may be
>> helpful.
>>>>>  - The systems in question have 4 processors, I believe technically 2
>> core
>>>> duo's
>>>>> each.
>>>>>  - Kernel is a vanilla 2.6.16.13 from kernel.org w/ click patch applied
>>>>>  - This appears to be related to an earlier problem I reported here,
>> which
>>>> at
>>>>> first appeared to be resolved:
>>>>> https://pdos.csail.mit.edu/pipermail/click/2007-August/006206.html
>>>>>
>>>>> Any input is appreciated, I will try to get crash dumps and such when I
>>>> regain
>>>>> access to the machine, as it is not local I must wait for someone else
>> to
>>>>> reboot it for me.
>>>>> Thanks,
>>>>>  - David
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> click mailing list
>>>>> click at amsterdam.lcs.mit.edu
>>>>> https://amsterdam.lcs.mit.edu/mailman/listinfo/click
>>>
>>>
>>>
>>>
>>> _______________________________________________
>>> click mailing list
>>> click at amsterdam.lcs.mit.edu
>>> https://amsterdam.lcs.mit.edu/mailman/listinfo/click
>>>
> 
> 
> 
> 
> 


More information about the click mailing list