[Click] Causing Kernel Panics Via Click

dmoore7@nd.edu dmoore7 at nd.edu
Fri Sep 14 12:04:47 EDT 2007


I'll re-checkout the newest CVS and use it in my recompile and reload of the
kernel.  Should have results in a couple hours.
 - David Moore

Quoting Eddie Kohler <kohler at cs.ucla.edu>:

> Never mind, I think I found it.  It is not due to the recent changes in Task,
> but rather some older changes to the Linux AnyDevice helper were incompatible
> with FromHost.  It looks like when FromHost creates its device, the
> associated
> ToDevice gets notified of the device's existence and tries to schedule its
> Task -- before the Task was initialized.  Bad, bad.
>
> I have tried to fix this by making most Task methods noops before until the
> Task is initialized.  Does the current checkin help?
>
> Eddie
>
>
> Eddie Kohler wrote:
> > There was probably a line just before the "Oops:" line...do you know what
> it was?
> >
> >
> > dmoore7 at nd.edu wrote:
> >> I grabbed the latest CVS and simply recompiled/installed it without any
> kernel
> >> modifications, but click crashed (without panicing the kernel).  I think
> this
> >> means I need to recompile the kernel, although I am unsure if I need to
> >> download vanilla sources and patch from there, or do it some other way.
> >>
> >> I'll probably start fresh from a vanilla kernel after lunch, unless
> someone
> >> offers advice/admonition otherwise.
> >> Thanks
> >>  - David Moore
> >>
> >> The crash:
> >> [root at filius ~]# click-install ./Filius-1-50.click
> >> Segmentation fault
> >> Message from syslogd at filius at Fri Sep 14 10:39:03 2007 ... <This line
> preceded
> >> all further lines, edited out for clarity>
> >> filius kernel: Oops: 0002 [#1]
> >> filius kernel: SMP
> >> filius kernel: CPU:    3
> >> filius kernel: EIP is at _ZN4Task17strong_rescheduleEv+0x12/0x1a0 [click]
> >> filius kernel: eax: f5cced14   ebx: 00000001   ecx: 00000000   edx:
> f7d36000
> >> filius kernel: esi: f5cced14   edi: f6333000   ebp: 00000000   esp:
> f6333cb8
> >> filius kernel: ds: 007b   es: 007b   ss: 0068
> >> filius kernel: Process click-install (pid: 3957, threadinfo=f6333000
> >> task=f7d36000)
> >> filius kernel: Stack: <0>f8affbaa 00000001 f5ccc800 f6333cd4 f8b005c7
> f6333cd4
> >> 00ffffff f5c21a20
> >> filius kernel:        00000001 00000004 f5ccc800 f8c79ac8 f5ccc800
> 00000001
> >> c02e2946 f5ccc800
> >> filius kernel:        00000000 00001002 c028a70d f5ccc800 00001043
> c028b96a
> >> f6333d2c 00000001
> >> filius kernel: Call Trace:
> >> filius kernel:  [<f8affbaa>]
> _ZN8ToDevice13change_deviceEP10net_device+0x2a/0x60
> >> [click]
> >> filius kernel:  [<f8b005c7>] device_notifier_hook+0x77/0x90 [click]
> >> filius kernel:  [<c02e2946>] notifier_call_chain+0x17/0x2e
> >> filius kernel:  [<c028a70d>] dev_open+0x66/0x6d
> >> filius kernel:  [<c028b96a>] dev_change_flags+0x48/0xed
> >> filius kernel:  [<c028c19f>] dev_ioctl+0x309/0x3e2
> >> filius kernel:  [<f8afc8a2>]
> >> _Z10dev_updownP10net_deviceiP12ErrorHandler+0x72/0x110 [click]
> >> filius kernel:  [<f8afc78c>]
> >> _ZN8FromHost20set_device_addressesEP12ErrorHandler+0xac/0x150 [click]
> >> filius kernel:  [<f8afca11>]
> _ZN8FromHost10initializeEP12ErrorHandler+0xd1/0xf0
> >> [click]
> >> filius kernel:  [<f8aaa5e6>] _Znaj+0x16/0x20 [click]
> >> filius kernel:  [<f8aca942>]
> _ZN6Router10initializeEP12ErrorHandler+0x682/0x780
> >> [click]
> >> filius kernel:  [<c013ec21>] __alloc_pages+0x59/0x273
> >> filius kernel:  [<f8b20111>]
> >> _Z12write_configRK6StringP7ElementPvP12ErrorHandler+0x121/0x1e0 [click]
> >> filius kernel:  [<f8ac456f>]
> >> _ZNK7Handler10call_writeERK6StringP7ElementbP12ErrorHandler+0x13f/0x210
> [cli
> >> ck]
> >> filius kernel:  [<c01533b6>] cache_grow+0x128/0x14a
> >> filius kernel:  [<f8b23789>] handler_flush+0x499/0x590 [click]
> >> filius kernel:  [<f8aaa5e6>] _Znaj+0x16/0x20 [click]
> >> filius kernel:  [<c0155724>] filp_close+0x31/0x52
> >> filius kernel:  [<c01031ab>] sysenter_past_esp+0x54/0x75
> >> filius kernel: Code: ff eb d7 c7 04 24 b5 26 b3 f8 e8 0a f4 ff ff e9 60 ff
> ff ff
> >> 90 8d 74 26 00 57 bf 00 f0 ff ff 56 89 c6 53 83 ec 04 21 e7 8b 4e 1c <f0>
> ff 41
> >> 4c 8d 59 44 8b 47 10 ba 01 00 00 00 39 43 04 74 2 a 8d
> >>
> >> Quoting Joonwoo Park <joonwpark81 at gmail.com>:
> >>
> >>> Hi David,
> >>>
> >>> If you are using linux 2.6.16.13
> >>> Just doing update your click source to lastest revision from cvs or
> >>> git may can solve problem. (without update kernel and compile)
> >>>
> >>> Joonwoo Park
> >>>
> >>> 2007/9/14, dmoore7 at nd.edu <dmoore7 at nd.edu>:
> >>>> Thanks Eddie,
> >>>> I will do that.  I assume it will be necesary to repatch and recompile
> the
> >>>> kernel?
> >>>> Should I re-download the vanilla sources from kernel.org and patch from
> >>> those,
> >>>> or is it possible to patch the already patched ones I have now.
> >>>> Thanks,
> >>>>  - David
> >>>>
> >>>> Quoting Eddie Kohler <kohler at cs.ucla.edu>:
> >>>>
> >>>>> It would certainly be helpful if you could upgrade filius to current
> CVS.
> >>>>> Much spinlock stuff has happened there recently.
> >>>>>
> >>>>> E
> >>>>>
> >>>>>
> >>>>> dmoore7 at nd.edu wrote:
> >>>>>> Hello Everyone,
> >>>>>> I have encounted a slightly unreliable method of bringing down one of
> >>> my
> >>>>> click
> >>>>>> machines.  I shall present an example below of how it happens.
> >>>>>>
> >>>>>>> Log into to freshly booted machine
> >>>>>>> Succesfully load locally stored click file
> >>>>>>> Ping another machine via the click router
> >>>>>>> Use w3c's webbot to successfully download a webpage from another host
> >>>>>>> Do so again, again with proper results
> >>>>>>> Execute click-uninstall
> >>>>>>> Reload the same kernel config I had loaded earlier
> >>>>>>> Meanwhile, in another ssh window to the webserver I have tcpdump
> >>> running,
> >>>>> and
> >>>>>> have been watching the traffic coming in and out.
> >>>>>>
> >>>>>>> I execute the same webbot command as previously, with the following
> >>> output
> >>>>>> (copy-pasted from the uninstall onward)
> >>>>>>
> >>>>>> [root at filius ~]# click-uninstall [root at filius ~]# click-install
> >>>>>> ./Filius-1-50.click
> >>>>>> [root at filius ~]# webbot -I spoof_eth2_0 -q -n -saveimg
> >>>>>> http://192.168.10.4/index3.html
> >>>>>> Message from syslogd at filius at Thu Sep 13 12:59:52 2007 ...
> >>>>>> filius kernel: BUG: spinlock wrong CPU on CPU#2, click-uninstall/4890
> >>>>>>
> >>>>>> Message from syslogd at filius at Thu Sep 13 12:59:52 2007 ...
> >>>>>> filius kernel:  lock: f8c79c44, .magic: dead4ead, .owner:
> >>>>> click-uninstall/4890,
> >>>>>> .owner_cpu: 3
> >>>>>>
> >>>>>> <after about 5 minutes>
> >>>>>> Read from remote host filius.cse.nd.edu: Connection timed out
> >>>>>> Connection to filius.cse.nd.edu closed.
> >>>>>>
> >>>>>>> During this I have been watching tcpdump on the http server, and it
> is
> >>>>>> proceeding until suddenly it stops receiving ack's.
> >>>>>>> Thus it appears the click module began malfunctioning mid-use, and
> not
> >>>>> simply
> >>>>>> upon its loading.
> >>>>>>
> >>>>>> Background information:
> >>>>>>  - There are 3 machines with click loaded on them, the config files I
> >>> used
> >>>>> can
> >>>>>> be found here:
> >>>>>> Filius (the one that crashed):
> >>> http://cse.nd.edu/~dmoore7/Filius-1-50.click
> >>>>>> Sybill (the forwarder): http://cse.nd.edu/~dmoore7/Sybill-1-50.click
> >>>>>> Hagrid (the http host): http://cse.nd.edu/~dmoore7/Hagrid-1-50.click
> >>>>>>  - For a general idea of what these router configs do, see this
> >>> diagram:
> >>>>>> http://cse.nd.edu/~dmoore7/myrouter.jpg
> >>>>>>  - The webbot I used is a slightly hacked version of w3c's webbot
> >>> (modified
> >>>>> to
> >>>>>> allow forcing of a particular device).  It exhibits no failures
> without
> >>>>> click.
> >>>>>>  - Filius is running a version of click downloaded about 3 weeks ago
> >>> from
> >>>>> the
> >>>>>> cvs.  The others are running 1.5.0 downloaded from the click website
> >>> during
> >>>>> the
> >>>>>> middle of this past summer.  I can update filius if that may be
> >>> helpful.
> >>>>>>  - The systems in question have 4 processors, I believe technically 2
> >>> core
> >>>>> duo's
> >>>>>> each.
> >>>>>>  - Kernel is a vanilla 2.6.16.13 from kernel.org w/ click patch
> applied
> >>>>>>  - This appears to be related to an earlier problem I reported here,
> >>> which
> >>>>> at
> >>>>>> first appeared to be resolved:
> >>>>>> https://pdos.csail.mit.edu/pipermail/click/2007-August/006206.html
> >>>>>>
> >>>>>> Any input is appreciated, I will try to get crash dumps and such when
> I
> >>>>> regain
> >>>>>> access to the machine, as it is not local I must wait for someone else
> >>> to
> >>>>>> reboot it for me.
> >>>>>> Thanks,
> >>>>>>  - David
> >>>>>>
> >>>>>>
> >>>>>> _______________________________________________
> >>>>>> click mailing list
> >>>>>> click at amsterdam.lcs.mit.edu
> >>>>>> https://amsterdam.lcs.mit.edu/mailman/listinfo/click
> >>>>
> >>>>
> >>>>
> >>>> _______________________________________________
> >>>> click mailing list
> >>>> click at amsterdam.lcs.mit.edu
> >>>> https://amsterdam.lcs.mit.edu/mailman/listinfo/click
> >>>>
> >>
> >>
> >>
> >>
> > _______________________________________________
> > click mailing list
> > click at amsterdam.lcs.mit.edu
> > https://amsterdam.lcs.mit.edu/mailman/listinfo/click
>







More information about the click mailing list