[Click] Causing Kernel Panics Via Click

dmoore7@nd.edu dmoore7 at nd.edu
Fri Sep 14 10:43:47 EDT 2007


I grabbed the latest CVS and simply recompiled/installed it without any kernel
modifications, but click crashed (without panicing the kernel).  I think this
means I need to recompile the kernel, although I am unsure if I need to
download vanilla sources and patch from there, or do it some other way.

I'll probably start fresh from a vanilla kernel after lunch, unless someone
offers advice/admonition otherwise.
Thanks
 - David Moore

The crash:
[root at filius ~]# click-install ./Filius-1-50.click
Segmentation fault
Message from syslogd at filius at Fri Sep 14 10:39:03 2007 ... <This line preceded
all further lines, edited out for clarity>
filius kernel: Oops: 0002 [#1]
filius kernel: SMP
filius kernel: CPU:    3
filius kernel: EIP is at _ZN4Task17strong_rescheduleEv+0x12/0x1a0 [click]
filius kernel: eax: f5cced14   ebx: 00000001   ecx: 00000000   edx: f7d36000
filius kernel: esi: f5cced14   edi: f6333000   ebp: 00000000   esp: f6333cb8
filius kernel: ds: 007b   es: 007b   ss: 0068
filius kernel: Process click-install (pid: 3957, threadinfo=f6333000
task=f7d36000)
filius kernel: Stack: <0>f8affbaa 00000001 f5ccc800 f6333cd4 f8b005c7 f6333cd4
00ffffff f5c21a20
filius kernel:        00000001 00000004 f5ccc800 f8c79ac8 f5ccc800 00000001
c02e2946 f5ccc800
filius kernel:        00000000 00001002 c028a70d f5ccc800 00001043 c028b96a
f6333d2c 00000001
filius kernel: Call Trace:
filius kernel:  [<f8affbaa>] _ZN8ToDevice13change_deviceEP10net_device+0x2a/0x60
[click]
filius kernel:  [<f8b005c7>] device_notifier_hook+0x77/0x90 [click]
filius kernel:  [<c02e2946>] notifier_call_chain+0x17/0x2e
filius kernel:  [<c028a70d>] dev_open+0x66/0x6d
filius kernel:  [<c028b96a>] dev_change_flags+0x48/0xed
filius kernel:  [<c028c19f>] dev_ioctl+0x309/0x3e2
filius kernel:  [<f8afc8a2>]
_Z10dev_updownP10net_deviceiP12ErrorHandler+0x72/0x110 [click]
filius kernel:  [<f8afc78c>]
_ZN8FromHost20set_device_addressesEP12ErrorHandler+0xac/0x150 [click]
filius kernel:  [<f8afca11>] _ZN8FromHost10initializeEP12ErrorHandler+0xd1/0xf0
[click]
filius kernel:  [<f8aaa5e6>] _Znaj+0x16/0x20 [click]
filius kernel:  [<f8aca942>] _ZN6Router10initializeEP12ErrorHandler+0x682/0x780
[click]
filius kernel:  [<c013ec21>] __alloc_pages+0x59/0x273
filius kernel:  [<f8b20111>]
_Z12write_configRK6StringP7ElementPvP12ErrorHandler+0x121/0x1e0 [click]
filius kernel:  [<f8ac456f>]
_ZNK7Handler10call_writeERK6StringP7ElementbP12ErrorHandler+0x13f/0x210 [cli
ck]
filius kernel:  [<c01533b6>] cache_grow+0x128/0x14a
filius kernel:  [<f8b23789>] handler_flush+0x499/0x590 [click]
filius kernel:  [<f8aaa5e6>] _Znaj+0x16/0x20 [click]
filius kernel:  [<c0155724>] filp_close+0x31/0x52
filius kernel:  [<c01031ab>] sysenter_past_esp+0x54/0x75
filius kernel: Code: ff eb d7 c7 04 24 b5 26 b3 f8 e8 0a f4 ff ff e9 60 ff ff ff
90 8d 74 26 00 57 bf 00 f0 ff ff 56 89 c6 53 83 ec 04 21 e7 8b 4e 1c <f0> ff 41
4c 8d 59 44 8b 47 10 ba 01 00 00 00 39 43 04 74 2 a 8d

Quoting Joonwoo Park <joonwpark81 at gmail.com>:

> Hi David,
>
> If you are using linux 2.6.16.13
> Just doing update your click source to lastest revision from cvs or
> git may can solve problem. (without update kernel and compile)
>
> Joonwoo Park
>
> 2007/9/14, dmoore7 at nd.edu <dmoore7 at nd.edu>:
> > Thanks Eddie,
> > I will do that.  I assume it will be necesary to repatch and recompile the
> > kernel?
> > Should I re-download the vanilla sources from kernel.org and patch from
> those,
> > or is it possible to patch the already patched ones I have now.
> > Thanks,
> >  - David
> >
> > Quoting Eddie Kohler <kohler at cs.ucla.edu>:
> >
> > > It would certainly be helpful if you could upgrade filius to current CVS.
> > > Much spinlock stuff has happened there recently.
> > >
> > > E
> > >
> > >
> > > dmoore7 at nd.edu wrote:
> > > > Hello Everyone,
> > > > I have encounted a slightly unreliable method of bringing down one of
> my
> > > click
> > > > machines.  I shall present an example below of how it happens.
> > > >
> > > >> Log into to freshly booted machine
> > > >> Succesfully load locally stored click file
> > > >> Ping another machine via the click router
> > > >> Use w3c's webbot to successfully download a webpage from another host
> > > >> Do so again, again with proper results
> > > >> Execute click-uninstall
> > > >> Reload the same kernel config I had loaded earlier
> > > >
> > > >> Meanwhile, in another ssh window to the webserver I have tcpdump
> running,
> > > and
> > > > have been watching the traffic coming in and out.
> > > >
> > > >> I execute the same webbot command as previously, with the following
> output
> > > > (copy-pasted from the uninstall onward)
> > > >
> > > > [root at filius ~]# click-uninstall [root at filius ~]# click-install
> > > > ./Filius-1-50.click
> > > > [root at filius ~]# webbot -I spoof_eth2_0 -q -n -saveimg
> > > > http://192.168.10.4/index3.html
> > > > Message from syslogd at filius at Thu Sep 13 12:59:52 2007 ...
> > > > filius kernel: BUG: spinlock wrong CPU on CPU#2, click-uninstall/4890
> > > >
> > > > Message from syslogd at filius at Thu Sep 13 12:59:52 2007 ...
> > > > filius kernel:  lock: f8c79c44, .magic: dead4ead, .owner:
> > > click-uninstall/4890,
> > > > .owner_cpu: 3
> > > >
> > > > <after about 5 minutes>
> > > > Read from remote host filius.cse.nd.edu: Connection timed out
> > > > Connection to filius.cse.nd.edu closed.
> > > >
> > > >> During this I have been watching tcpdump on the http server, and it is
> > > > proceeding until suddenly it stops receiving ack's.
> > > >> Thus it appears the click module began malfunctioning mid-use, and not
> > > simply
> > > > upon its loading.
> > > >
> > > > Background information:
> > > >  - There are 3 machines with click loaded on them, the config files I
> used
> > > can
> > > > be found here:
> > > > Filius (the one that crashed):
> http://cse.nd.edu/~dmoore7/Filius-1-50.click
> > > > Sybill (the forwarder): http://cse.nd.edu/~dmoore7/Sybill-1-50.click
> > > > Hagrid (the http host): http://cse.nd.edu/~dmoore7/Hagrid-1-50.click
> > > >  - For a general idea of what these router configs do, see this
> diagram:
> > > > http://cse.nd.edu/~dmoore7/myrouter.jpg
> > > >  - The webbot I used is a slightly hacked version of w3c's webbot
> (modified
> > > to
> > > > allow forcing of a particular device).  It exhibits no failures without
> > > click.
> > > >  - Filius is running a version of click downloaded about 3 weeks ago
> from
> > > the
> > > > cvs.  The others are running 1.5.0 downloaded from the click website
> during
> > > the
> > > > middle of this past summer.  I can update filius if that may be
> helpful.
> > > >  - The systems in question have 4 processors, I believe technically 2
> core
> > > duo's
> > > > each.
> > > >  - Kernel is a vanilla 2.6.16.13 from kernel.org w/ click patch applied
> > > >  - This appears to be related to an earlier problem I reported here,
> which
> > > at
> > > > first appeared to be resolved:
> > > > https://pdos.csail.mit.edu/pipermail/click/2007-August/006206.html
> > > >
> > > > Any input is appreciated, I will try to get crash dumps and such when I
> > > regain
> > > > access to the machine, as it is not local I must wait for someone else
> to
> > > > reboot it for me.
> > > > Thanks,
> > > >  - David
> > > >
> > > >
> > > > _______________________________________________
> > > > click mailing list
> > > > click at amsterdam.lcs.mit.edu
> > > > https://amsterdam.lcs.mit.edu/mailman/listinfo/click
> > >
> >
> >
> >
> >
> >
> > _______________________________________________
> > click mailing list
> > click at amsterdam.lcs.mit.edu
> > https://amsterdam.lcs.mit.edu/mailman/listinfo/click
> >
>







More information about the click mailing list