[Click] Help with kernel OOPS

Eddie Kohler kohler at cs.ucla.edu
Wed Sep 27 13:52:22 EDT 2006


Hi Vivek,

Let me see if I understand this properly.

(1) Your driver is not polling.
(2) The panic only happens on click-uninstall.

That may be enough to help diagnose.

Eddie


Vivek raghunathan wrote:
> Hi there.
> 
> My diagnosis of the reason for the race condition I have been
> observing were  incorrect.  While it is true that ToDevice::run_task
> is often entered with _dev = 0 during the click-uninstall process, the
> panics are not specific to this . The InfiniteSource script I attached
> earlier seems to be able to generate these panics quite regularly
> (once every 3 iterations of a click-install, click-uninstall). I did a
> fresh checkout from CVS two days back, and could reproduce all the
> panics there too. The oops messages are the ones I attached in my
> earlier emails. I am running 2.6.16.13.
> 
> Vivek
> 
> 
> On 9/20/06, Vivek raghunathan <vivek.raghunathan at gmail.com> wrote:
>> Eddie and All,
>>
>> It seems like there is a race in ToDevice::run_task that has to do
>> with run_task being scheduled after ToDevice::cleanup -> clear_device
>> -> dev_put. I am attaching a patch which fixes the resulting null
>> pointer derefence in ToDevice::run_task.
>>
>> It seems that other elements (e.g. InfiniteSource)'s run_task methods
>> also sometimes trigger null pointer dereferences during
>> click-uninstall. (The Linux kernel is able to recover from some of
>> these; I suspect this is because CONFIG_DETECT_SOFTLOCKUP is enabled,
>> unlike the ToDevice panic which always crashed the kernel.) I'm not
>> too familiar with the Click module cleanup code - is it true that
>> Element's task handlers are always automatically unscheduled on
>> uninstall before the corresponding Element's cleanup method is called?
>>
>> Vivek
>>
>>
>> --- todevice.cc.current 2006-09-20 16:32:58.000000000 -0500
>> +++ todevice.cc 2006-09-20 16:46:01.000000000 -0500
>> @@ -233,6 +233,12 @@
>>      int sent = 0;
>>
>>      _runs++;
>> +
>> +    // prevent race condition during cleanup when clear_device sets 
>> _dev = 0
>> +    if(!_dev) {
>> +       _task.fast_reschedule();
>> +       return false;
>> +    }
>>
>> #if LINUX_VERSION_CODE >= 0x020400
>>      local_bh_disable();
>>
>>
>> On 9/19/06, Vivek raghunathan <vivek.raghunathan at gmail.com> wrote:
>> > All,
>> >
>> > The bug I reported is not specific to my code, and is probably a
>> > ToDevice race condition that I was inadvertently triggering. Using the
>> > following configuration generates the same kernel oops with EIP at
>> > ToDevice::run_task() in interrupt context. (My Ethernet NIC is a Intel
>> > Pro/100 using the e100 driver.)
>> >
>> >   AddressInfo(MyEther 00:11:25:2D:7D:33, RemoteEther 00:11:25:47:EA:7B,
>> >             MyIP 10.1.1.2/8, RemoteIP 10.1.1.1/8, BroadcastAddr 
>> 10.255.255.255);
>> >
>> >   FromHost(fak0, MyIP, ETHER MyEther) -> q1::Queue
>> >   q1 -> [0]prio::PrioSched -> Print(test_tx, 100) -> ToDevice(eth0);
>> >   FromDevice(eth0) -> Print(test_rx, 100) -> SetPacketType(HOST) ->
>> > ToHost(fak0);
>> >
>> >   splsrc::InfiniteSource(
>> >   DATA \<aa bb cc dd ee ff>, LIMIT -1, STOP true);
>> >   splsrc -> ipenc::IPEncap(222, MyIP, BroadcastAddr);
>> >   ipenc -> ethenc::EtherEncap(0x0800, ff:ff:ff:ff:ff:ff, MyEther);
>> >   ethenc -> q2::Queue;
>> >   q2 -> [1]prio;
>> >
>> > -Vivek
>> >
>> >
>> >
>> > On 9/19/06, Vivek raghunathan <vivek.raghunathan at gmail.com> wrote:
>> > > Hi all.
>> > >
>> > > I am currently implementing a Click-based opportunistic packet
>> > > combination engine for use on top of IEEE 802.11. I've unit tested my
>> > > implementation fairly extensively in user-space, and partly
>> > > unit-tested in kernelspace, and haven't had any issues so far. I
>> > > recently moved to doing integration testing, and the code seems to 
>> run
>> > > okay in-kernel without any problems, except that every so often 
>> (maybe
>> > > 6 out of 10 times), click-uninstall causes a kernel panic in 
>> interrupt
>> > > context on cleanup.
>> > >
>> > > The panic seems to be related to my code on the tx output path; since
>> > > it only appears for a few particular configurations, and only when
>> > > some of my elements are introducted. The configuration I am using 
>> that
>> > > triggers the panic is attached.  I've also manually copied the
>> > > oops-trace from the screen, and attached it with this email. A
>> > > register dump using sysrq does not produce any additional useful 
>> info,
>> > > so I have excluded it. It seems like the panic is triggered somewhere
>> > > in ToDevice::run_task. I realize that some brain-dead bug in my code
>> > > is probably at fault, and am currently double-checking everything 
>> I've
>> > > written. I am posting here mainly because I am not sure if this is a
>> > > ToDevice bug that I am inadvertently triggering.
>> > >
>> > > Additionally, I'm having trouble getting ksymoops to run with click.
>> > > Any ideas on how I go about it? (I've also tried using kexec/kdump,
>> > > but it seems like these are very twitchy about what kernel config is
>> > > used, and have issues with the one I am using).
>> > >
>> > > Vivek
>> > >
>> > >
>> > > --
>> > >
>> > > ---
>> > >
>> > > *************************************
>> > > Vivek Raghunathan,
>> > > PhD student,
>> > > University of Illinois, Urbana-Champaign
>> > >
>> > > Contact Details:
>> > > 1012 W. Clark St #31,
>> > > Urbana IL 61801
>> > >
>> > > ph: 217-766-1868 (cell)
>> > >     217-333-7541 (off)
>> > >
>> > >
>> > >
>> >
>> >
>> > --
>> >
>> > ---
>> >
>> > *************************************
>> > Vivek Raghunathan,
>> > PhD student,
>> > University of Illinois, Urbana-Champaign
>> >
>> > Contact Details:
>> > 1012 W. Clark St #31,
>> > Urbana IL 61801
>> >
>> > ph: 217-766-1868 (cell)
>> >     217-333-7541 (off)
>> >
>>
>>
>> -- 
>>
>> ---
>>
>> *************************************
>> Vivek Raghunathan,
>> PhD student,
>> University of Illinois, Urbana-Champaign
>>
>> Contact Details:
>> 1012 W. Clark St #31,
>> Urbana IL 61801
>>
>> ph: 217-766-1868 (cell)
>>     217-333-7541 (off)
>>
> 
> 


More information about the click mailing list