[Click] Causing Kernel Panics Via Click

Eddie Kohler kohler at cs.ucla.edu
Fri Sep 14 11:57:02 EDT 2007


Never mind, I think I found it.  It is not due to the recent changes in Task, 
but rather some older changes to the Linux AnyDevice helper were incompatible 
with FromHost.  It looks like when FromHost creates its device, the associated 
ToDevice gets notified of the device's existence and tries to schedule its 
Task -- before the Task was initialized.  Bad, bad.

I have tried to fix this by making most Task methods noops before until the 
Task is initialized.  Does the current checkin help?

Eddie


Eddie Kohler wrote:
> There was probably a line just before the "Oops:" line...do you know what it was?
> 
> 
> dmoore7 at nd.edu wrote:
>> I grabbed the latest CVS and simply recompiled/installed it without any kernel
>> modifications, but click crashed (without panicing the kernel).  I think this
>> means I need to recompile the kernel, although I am unsure if I need to
>> download vanilla sources and patch from there, or do it some other way.
>>
>> I'll probably start fresh from a vanilla kernel after lunch, unless someone
>> offers advice/admonition otherwise.
>> Thanks
>>  - David Moore
>>
>> The crash:
>> [root at filius ~]# click-install ./Filius-1-50.click
>> Segmentation fault
>> Message from syslogd at filius at Fri Sep 14 10:39:03 2007 ... <This line preceded
>> all further lines, edited out for clarity>
>> filius kernel: Oops: 0002 [#1]
>> filius kernel: SMP
>> filius kernel: CPU:    3
>> filius kernel: EIP is at _ZN4Task17strong_rescheduleEv+0x12/0x1a0 [click]
>> filius kernel: eax: f5cced14   ebx: 00000001   ecx: 00000000   edx: f7d36000
>> filius kernel: esi: f5cced14   edi: f6333000   ebp: 00000000   esp: f6333cb8
>> filius kernel: ds: 007b   es: 007b   ss: 0068
>> filius kernel: Process click-install (pid: 3957, threadinfo=f6333000
>> task=f7d36000)
>> filius kernel: Stack: <0>f8affbaa 00000001 f5ccc800 f6333cd4 f8b005c7 f6333cd4
>> 00ffffff f5c21a20
>> filius kernel:        00000001 00000004 f5ccc800 f8c79ac8 f5ccc800 00000001
>> c02e2946 f5ccc800
>> filius kernel:        00000000 00001002 c028a70d f5ccc800 00001043 c028b96a
>> f6333d2c 00000001
>> filius kernel: Call Trace:
>> filius kernel:  [<f8affbaa>] _ZN8ToDevice13change_deviceEP10net_device+0x2a/0x60
>> [click]
>> filius kernel:  [<f8b005c7>] device_notifier_hook+0x77/0x90 [click]
>> filius kernel:  [<c02e2946>] notifier_call_chain+0x17/0x2e
>> filius kernel:  [<c028a70d>] dev_open+0x66/0x6d
>> filius kernel:  [<c028b96a>] dev_change_flags+0x48/0xed
>> filius kernel:  [<c028c19f>] dev_ioctl+0x309/0x3e2
>> filius kernel:  [<f8afc8a2>]
>> _Z10dev_updownP10net_deviceiP12ErrorHandler+0x72/0x110 [click]
>> filius kernel:  [<f8afc78c>]
>> _ZN8FromHost20set_device_addressesEP12ErrorHandler+0xac/0x150 [click]
>> filius kernel:  [<f8afca11>] _ZN8FromHost10initializeEP12ErrorHandler+0xd1/0xf0
>> [click]
>> filius kernel:  [<f8aaa5e6>] _Znaj+0x16/0x20 [click]
>> filius kernel:  [<f8aca942>] _ZN6Router10initializeEP12ErrorHandler+0x682/0x780
>> [click]
>> filius kernel:  [<c013ec21>] __alloc_pages+0x59/0x273
>> filius kernel:  [<f8b20111>]
>> _Z12write_configRK6StringP7ElementPvP12ErrorHandler+0x121/0x1e0 [click]
>> filius kernel:  [<f8ac456f>]
>> _ZNK7Handler10call_writeERK6StringP7ElementbP12ErrorHandler+0x13f/0x210 [cli
>> ck]
>> filius kernel:  [<c01533b6>] cache_grow+0x128/0x14a
>> filius kernel:  [<f8b23789>] handler_flush+0x499/0x590 [click]
>> filius kernel:  [<f8aaa5e6>] _Znaj+0x16/0x20 [click]
>> filius kernel:  [<c0155724>] filp_close+0x31/0x52
>> filius kernel:  [<c01031ab>] sysenter_past_esp+0x54/0x75
>> filius kernel: Code: ff eb d7 c7 04 24 b5 26 b3 f8 e8 0a f4 ff ff e9 60 ff ff ff
>> 90 8d 74 26 00 57 bf 00 f0 ff ff 56 89 c6 53 83 ec 04 21 e7 8b 4e 1c <f0> ff 41
>> 4c 8d 59 44 8b 47 10 ba 01 00 00 00 39 43 04 74 2 a 8d
>>
>> Quoting Joonwoo Park <joonwpark81 at gmail.com>:
>>
>>> Hi David,
>>>
>>> If you are using linux 2.6.16.13
>>> Just doing update your click source to lastest revision from cvs or
>>> git may can solve problem. (without update kernel and compile)
>>>
>>> Joonwoo Park
>>>
>>> 2007/9/14, dmoore7 at nd.edu <dmoore7 at nd.edu>:
>>>> Thanks Eddie,
>>>> I will do that.  I assume it will be necesary to repatch and recompile the
>>>> kernel?
>>>> Should I re-download the vanilla sources from kernel.org and patch from
>>> those,
>>>> or is it possible to patch the already patched ones I have now.
>>>> Thanks,
>>>>  - David
>>>>
>>>> Quoting Eddie Kohler <kohler at cs.ucla.edu>:
>>>>
>>>>> It would certainly be helpful if you could upgrade filius to current CVS.
>>>>> Much spinlock stuff has happened there recently.
>>>>>
>>>>> E
>>>>>
>>>>>
>>>>> dmoore7 at nd.edu wrote:
>>>>>> Hello Everyone,
>>>>>> I have encounted a slightly unreliable method of bringing down one of
>>> my
>>>>> click
>>>>>> machines.  I shall present an example below of how it happens.
>>>>>>
>>>>>>> Log into to freshly booted machine
>>>>>>> Succesfully load locally stored click file
>>>>>>> Ping another machine via the click router
>>>>>>> Use w3c's webbot to successfully download a webpage from another host
>>>>>>> Do so again, again with proper results
>>>>>>> Execute click-uninstall
>>>>>>> Reload the same kernel config I had loaded earlier
>>>>>>> Meanwhile, in another ssh window to the webserver I have tcpdump
>>> running,
>>>>> and
>>>>>> have been watching the traffic coming in and out.
>>>>>>
>>>>>>> I execute the same webbot command as previously, with the following
>>> output
>>>>>> (copy-pasted from the uninstall onward)
>>>>>>
>>>>>> [root at filius ~]# click-uninstall [root at filius ~]# click-install
>>>>>> ./Filius-1-50.click
>>>>>> [root at filius ~]# webbot -I spoof_eth2_0 -q -n -saveimg
>>>>>> http://192.168.10.4/index3.html
>>>>>> Message from syslogd at filius at Thu Sep 13 12:59:52 2007 ...
>>>>>> filius kernel: BUG: spinlock wrong CPU on CPU#2, click-uninstall/4890
>>>>>>
>>>>>> Message from syslogd at filius at Thu Sep 13 12:59:52 2007 ...
>>>>>> filius kernel:  lock: f8c79c44, .magic: dead4ead, .owner:
>>>>> click-uninstall/4890,
>>>>>> .owner_cpu: 3
>>>>>>
>>>>>> <after about 5 minutes>
>>>>>> Read from remote host filius.cse.nd.edu: Connection timed out
>>>>>> Connection to filius.cse.nd.edu closed.
>>>>>>
>>>>>>> During this I have been watching tcpdump on the http server, and it is
>>>>>> proceeding until suddenly it stops receiving ack's.
>>>>>>> Thus it appears the click module began malfunctioning mid-use, and not
>>>>> simply
>>>>>> upon its loading.
>>>>>>
>>>>>> Background information:
>>>>>>  - There are 3 machines with click loaded on them, the config files I
>>> used
>>>>> can
>>>>>> be found here:
>>>>>> Filius (the one that crashed):
>>> http://cse.nd.edu/~dmoore7/Filius-1-50.click
>>>>>> Sybill (the forwarder): http://cse.nd.edu/~dmoore7/Sybill-1-50.click
>>>>>> Hagrid (the http host): http://cse.nd.edu/~dmoore7/Hagrid-1-50.click
>>>>>>  - For a general idea of what these router configs do, see this
>>> diagram:
>>>>>> http://cse.nd.edu/~dmoore7/myrouter.jpg
>>>>>>  - The webbot I used is a slightly hacked version of w3c's webbot
>>> (modified
>>>>> to
>>>>>> allow forcing of a particular device).  It exhibits no failures without
>>>>> click.
>>>>>>  - Filius is running a version of click downloaded about 3 weeks ago
>>> from
>>>>> the
>>>>>> cvs.  The others are running 1.5.0 downloaded from the click website
>>> during
>>>>> the
>>>>>> middle of this past summer.  I can update filius if that may be
>>> helpful.
>>>>>>  - The systems in question have 4 processors, I believe technically 2
>>> core
>>>>> duo's
>>>>>> each.
>>>>>>  - Kernel is a vanilla 2.6.16.13 from kernel.org w/ click patch applied
>>>>>>  - This appears to be related to an earlier problem I reported here,
>>> which
>>>>> at
>>>>>> first appeared to be resolved:
>>>>>> https://pdos.csail.mit.edu/pipermail/click/2007-August/006206.html
>>>>>>
>>>>>> Any input is appreciated, I will try to get crash dumps and such when I
>>>>> regain
>>>>>> access to the machine, as it is not local I must wait for someone else
>>> to
>>>>>> reboot it for me.
>>>>>> Thanks,
>>>>>>  - David
>>>>>>
>>>>>>
>>>>>> _______________________________________________
>>>>>> click mailing list
>>>>>> click at amsterdam.lcs.mit.edu
>>>>>> https://amsterdam.lcs.mit.edu/mailman/listinfo/click
>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> click mailing list
>>>> click at amsterdam.lcs.mit.edu
>>>> https://amsterdam.lcs.mit.edu/mailman/listinfo/click
>>>>
>>
>>
>>
>>
> _______________________________________________
> click mailing list
> click at amsterdam.lcs.mit.edu
> https://amsterdam.lcs.mit.edu/mailman/listinfo/click


More information about the click mailing list