[Click] Help with kernel OOPS

Vivek raghunathan vivek.raghunathan at gmail.com
Tue Sep 26 22:59:57 EDT 2006


Hi there.

My diagnosis of the reason for the race condition I have been
observing were  incorrect.  While it is true that ToDevice::run_task
is often entered with _dev = 0 during the click-uninstall process, the
panics are not specific to this . The InfiniteSource script I attached
earlier seems to be able to generate these panics quite regularly
(once every 3 iterations of a click-install, click-uninstall). I did a
fresh checkout from CVS two days back, and could reproduce all the
panics there too. The oops messages are the ones I attached in my
earlier emails. I am running 2.6.16.13.

Vivek


On 9/20/06, Vivek raghunathan <vivek.raghunathan at gmail.com> wrote:
> Eddie and All,
>
> It seems like there is a race in ToDevice::run_task that has to do
> with run_task being scheduled after ToDevice::cleanup -> clear_device
> -> dev_put. I am attaching a patch which fixes the resulting null
> pointer derefence in ToDevice::run_task.
>
> It seems that other elements (e.g. InfiniteSource)'s run_task methods
> also sometimes trigger null pointer dereferences during
> click-uninstall. (The Linux kernel is able to recover from some of
> these; I suspect this is because CONFIG_DETECT_SOFTLOCKUP is enabled,
> unlike the ToDevice panic which always crashed the kernel.) I'm not
> too familiar with the Click module cleanup code - is it true that
> Element's task handlers are always automatically unscheduled on
> uninstall before the corresponding Element's cleanup method is called?
>
> Vivek
>
>
> --- todevice.cc.current 2006-09-20 16:32:58.000000000 -0500
> +++ todevice.cc 2006-09-20 16:46:01.000000000 -0500
> @@ -233,6 +233,12 @@
>      int sent = 0;
>
>      _runs++;
> +
> +    // prevent race condition during cleanup when clear_device sets _dev = 0
> +    if(!_dev) {
> +       _task.fast_reschedule();
> +       return false;
> +    }
>
> #if LINUX_VERSION_CODE >= 0x020400
>      local_bh_disable();
>
>
> On 9/19/06, Vivek raghunathan <vivek.raghunathan at gmail.com> wrote:
> > All,
> >
> > The bug I reported is not specific to my code, and is probably a
> > ToDevice race condition that I was inadvertently triggering. Using the
> > following configuration generates the same kernel oops with EIP at
> > ToDevice::run_task() in interrupt context. (My Ethernet NIC is a Intel
> > Pro/100 using the e100 driver.)
> >
> >   AddressInfo(MyEther 00:11:25:2D:7D:33, RemoteEther 00:11:25:47:EA:7B,
> >             MyIP 10.1.1.2/8, RemoteIP 10.1.1.1/8, BroadcastAddr 10.255.255.255);
> >
> >   FromHost(fak0, MyIP, ETHER MyEther) -> q1::Queue
> >   q1 -> [0]prio::PrioSched -> Print(test_tx, 100) -> ToDevice(eth0);
> >   FromDevice(eth0) -> Print(test_rx, 100) -> SetPacketType(HOST) ->
> > ToHost(fak0);
> >
> >   splsrc::InfiniteSource(
> >   DATA \<aa bb cc dd ee ff>, LIMIT -1, STOP true);
> >   splsrc -> ipenc::IPEncap(222, MyIP, BroadcastAddr);
> >   ipenc -> ethenc::EtherEncap(0x0800, ff:ff:ff:ff:ff:ff, MyEther);
> >   ethenc -> q2::Queue;
> >   q2 -> [1]prio;
> >
> > -Vivek
> >
> >
> >
> > On 9/19/06, Vivek raghunathan <vivek.raghunathan at gmail.com> wrote:
> > > Hi all.
> > >
> > > I am currently implementing a Click-based opportunistic packet
> > > combination engine for use on top of IEEE 802.11. I've unit tested my
> > > implementation fairly extensively in user-space, and partly
> > > unit-tested in kernelspace, and haven't had any issues so far. I
> > > recently moved to doing integration testing, and the code seems to run
> > > okay in-kernel without any problems, except that every so often (maybe
> > > 6 out of 10 times), click-uninstall causes a kernel panic in interrupt
> > > context on cleanup.
> > >
> > > The panic seems to be related to my code on the tx output path; since
> > > it only appears for a few particular configurations, and only when
> > > some of my elements are introducted. The configuration I am using that
> > > triggers the panic is attached.  I've also manually copied the
> > > oops-trace from the screen, and attached it with this email. A
> > > register dump using sysrq does not produce any additional useful info,
> > > so I have excluded it. It seems like the panic is triggered somewhere
> > > in ToDevice::run_task. I realize that some brain-dead bug in my code
> > > is probably at fault, and am currently double-checking everything I've
> > > written. I am posting here mainly because I am not sure if this is a
> > > ToDevice bug that I am inadvertently triggering.
> > >
> > > Additionally, I'm having trouble getting ksymoops to run with click.
> > > Any ideas on how I go about it? (I've also tried using kexec/kdump,
> > > but it seems like these are very twitchy about what kernel config is
> > > used, and have issues with the one I am using).
> > >
> > > Vivek
> > >
> > >
> > > --
> > >
> > > ---
> > >
> > > *************************************
> > > Vivek Raghunathan,
> > > PhD student,
> > > University of Illinois, Urbana-Champaign
> > >
> > > Contact Details:
> > > 1012 W. Clark St #31,
> > > Urbana IL 61801
> > >
> > > ph: 217-766-1868 (cell)
> > >     217-333-7541 (off)
> > >
> > >
> > >
> >
> >
> > --
> >
> > ---
> >
> > *************************************
> > Vivek Raghunathan,
> > PhD student,
> > University of Illinois, Urbana-Champaign
> >
> > Contact Details:
> > 1012 W. Clark St #31,
> > Urbana IL 61801
> >
> > ph: 217-766-1868 (cell)
> >     217-333-7541 (off)
> >
>
>
> --
>
> ---
>
> *************************************
> Vivek Raghunathan,
> PhD student,
> University of Illinois, Urbana-Champaign
>
> Contact Details:
> 1012 W. Clark St #31,
> Urbana IL 61801
>
> ph: 217-766-1868 (cell)
>     217-333-7541 (off)
>


-- 

---

*************************************
Vivek Raghunathan,
PhD student,
University of Illinois, Urbana-Champaign

Contact Details:
1012 W. Clark St #31,
Urbana IL 61801

ph: 217-766-1868 (cell)
    217-333-7541 (off)


More information about the click mailing list