[Click] Help with kernel OOPS

Eddie Kohler kohler at cs.ucla.edu
Wed Sep 27 15:07:36 EDT 2006


Weird, weird.

Can I ask you for some help debugging?

In particular, can you determine whether 
RouterThread::unschedule_router_tasks() is being called by Master::kill_router()?

Eddie


Vivek raghunathan wrote:
> Eddie,
> 
> Here's a script without FromHost that generates an oops. This oops
> doesn't hang the machine though ...
> 
> AddressInfo(MyEther 00:11:25:2D:7D:33, RemoteEther 00:11:25:47:EA:7B,
>          MyIP 10.1.1.2/8, RemoteIP 10.1.1.1/8, BroadcastAddr 
> 10.255.255.255);
> 
> FromDevice(eth0) -> SetPacketType(HOST) -> ToHost(eth0);
> 
> splsrc::InfiniteSource(
> DATA \<aa bb cc dd ee ff>, LIMIT -1, STOP true);
> splsrc -> ipenc::IPEncap(222, MyIP, BroadcastAddr);
> ipenc -> ethenc::EtherEncap(0x0800, ff:ff:ff:ff:ff:ff, MyEther);
> ethenc -> q2::Queue;
> q2 -> ToDevice(eth0);
> 
> -Vivek
> 
> 
> 
> Sep 27 13:46:28 localhost kernel: [4294861.518000] Unable to handle
> kernel NULL pointer dereference at virtual address 00000000
> Sep 27 13:46:28 localhost kernel: [4294861.518000]  printing eip:
> Sep 27 13:46:28 localhost kernel: [4294861.518000] d124f881
> Sep 27 13:46:28 localhost kernel: [4294861.518000] *pde = 00000000
> Sep 27 13:46:28 localhost kernel: [4294861.518000] Oops: 0000 [#1]
> Sep 27 13:46:28 localhost kernel: [4294861.518000] PREEMPT
> Sep 27 13:46:28 localhost kernel: [4294861.518000] Modules linked in:
> click proclikefs rfcomm l2cap bluetooth nvram uinput ppdev radeon drm
> speedstep_centrino cpufreq_userspace cpufreq_stats freq_table
> cpufreq_powersave cpufreq_ondemand cpufreq_conservative video ibm_acpi
> container button battery ac ipv6 dm_mod md_mod lp af_packet airo_cs
> airo pcmcia joydev tsdev e100 ipw2200 mii ide_cd cdrom ieee80211
> ieee80211_crypt yenta_socket rsrc_nonstatic pcmcia_core snd_intel8x0
> snd_ac97_codec snd_ac97_bus snd_pcm_oss snd_mixer_oss snd_pcm
> snd_timer hw_random psmouse snd soundcore parport_pc parport ehci_hcd
> uhci_hcd shpchp pci_hotplug usbcore serio_raw pcspkr floppy
> snd_page_alloc rtc intel_agp agpgart evdev ext3 jbd mbcache ide_disk
> ide_generic via82cxxx trm290 triflex slc90e66 sis5513 siimage
> serverworks sc1200 rz1000 piix pdc202xx_old pdc202xx_new opti621
> ns87415 it821x hpt366 hpt34x generic cy82c693 cs5535 cs5530 cs5520
> cmd64x atiixp amd74xx alim15x3 aec62xx thermal processor fan
> Sep 27 13:46:28 localhost kernel: [4294861.518000] CPU:    0
> Sep 27 13:46:28 localhost kernel: [4294861.518000] EIP:
> 0060:[pg0+267880577/1053979648]    Not tainted VLI
> Sep 27 13:46:28 localhost kernel: [4294861.518000] EFLAGS: 00010282
> (2.6.16.13 #6)
> Sep 27 13:46:28 localhost kernel: [4294861.518000] EIP is at
> _ZN7Element4pushEiP6Packet+0x1d/0x3c [click]
> Sep 27 13:46:28 localhost kernel: [4294861.518000] eax: c6c21a94
> ebx: c6c21a80   ecx: d124f864   edx: 00000000
> Sep 27 13:46:28 localhost kernel: [4294861.518000] esi: cb93ed40
> edi: 00000000   ebp: 00000001   esp: cfd63f70
> Sep 27 13:46:28 localhost kernel: [4294861.518000] ds: 007b   es: 007b
>  ss: 0068
> Sep 27 13:46:28 localhost kernel: [4294861.518000] Process kclick
> (pid: 4057, threadinfo=cfd62000 task=c596f560)
> Sep 27 13:46:28 localhost kernel: [4294861.518000] Stack: <0>cb93ed40
> c6c21980 d12a4f82 c6c21a80 00000000 cb93ed40 cb93ed4c cc20ed80
> Sep 27 13:46:28 localhost kernel: [4294861.518000]        00000001
> 00000080 cf6704c0 0003e504 d12646c9 c6c21980 c6c219ec cf670c5c
> Sep 27 13:46:28 localhost kernel: [4294861.518000]        00000010
> 00000020 d12c3061 00000010 ccbd7e00 c596f560 cf6704c0 cfd62000
> Sep 27 13:46:28 localhost kernel: [4294861.518000] Call Trace:
> Sep 27 13:46:28 localhost kernel: [4294861.518000]
> [pg0+268230530/1053979648]
> _ZN14InfiniteSource8run_taskEP4Task+0xb6/0x12c [click]
> Sep 27 13:46:28 localhost kernel: [4294861.518000]
> [pg0+267966153/1053979648] _ZN12RouterThread6driverEv+0x12d/0x2a0
> [click]
> Sep 27 13:46:28 localhost kernel: [4294861.518000]
> [pg0+268353633/1053979648] _ZN6VectorIiE7reserveEi+0x2d/0x8c [click]
> Sep 27 13:46:28 localhost kernel: [4294861.518000]
> [pg0+268314274/1053979648] _Z11click_schedPv+0x8e/0x164 [click]
> Sep 27 13:46:28 localhost kernel: [4294861.518000]
> [pg0+268314132/1053979648] _Z11click_schedPv+0x0/0x164 [click]
> Sep 27 13:46:28 localhost kernel: [4294861.518000]
> [kernel_thread_helper+5/12] kernel_thread_helper+0x5/0xc
> Sep 27 13:46:28 localhost kernel: [4294861.518000] Code: c0 5b c3 8d
> 76 00 b8 ff ff ff ff 5b c3 90 56 53 8b 5c 24 0c 8b 03 ff 74 24 14 53
> ff 50 10 89 c6 58 5a 85 f6 74 20 8b 43 08 8b 10 <8b> 0a 89 74 24 14 8b
> 40 04 89 44 24 10 89 54 24 0c 8b 49 08 5b
> Sep 27 13:46:33 localhost kernel: [4294861.518000]  <1>click: current
> router threads refuse to die!
> Sep 27 13:46:33 localhost kernel: [4294866.502000] click: Following
> threads still active, expect a crash:
> 
> 
> 
> On 9/27/06, Eddie Kohler <kohler at cs.ucla.edu> wrote:
>> Another question: Can you make the oops happen in a configuration without
>> FromHost?
>>
>> FromHost installs a new networking device in the kernel.  When 
>> FromHost is
>> cleaned up, this networking device is unregistered.  It looks like 
>> Linux wants
>> to schedule() during the process of unregistering the network device.  
>> Click
>> does not want Linux to schedule().  This is the "scheduling while 
>> atomic" message.
>>
>> The thing that's weird is that ToDevice should already have been 
>> removed from
>> the scheduling list, even before the "scheudling while atomic" message.
>>
>> Eddie
>>
>>
>> Vivek raghunathan wrote:
>> > All,
>> >
>> > The bug I reported is not specific to my code, and is probably a
>> > ToDevice race condition that I was inadvertently triggering. Using the
>> > following configuration generates the same kernel oops with EIP at
>> > ToDevice::run_task() in interrupt context. (My Ethernet NIC is a Intel
>> > Pro/100 using the e100 driver.)
>> >
>> >   AddressInfo(MyEther 00:11:25:2D:7D:33, RemoteEther 00:11:25:47:EA:7B,
>> >             MyIP 10.1.1.2/8, RemoteIP 10.1.1.1/8, BroadcastAddr 
>> 10.255.255.255);
>> >
>> >   FromHost(fak0, MyIP, ETHER MyEther) -> q1::Queue
>> >   q1 -> [0]prio::PrioSched -> Print(test_tx, 100) -> ToDevice(eth0);
>> >   FromDevice(eth0) -> Print(test_rx, 100) -> SetPacketType(HOST) ->
>> > ToHost(fak0);
>> >
>> >   splsrc::InfiniteSource(
>> >   DATA \<aa bb cc dd ee ff>, LIMIT -1, STOP true);
>> >   splsrc -> ipenc::IPEncap(222, MyIP, BroadcastAddr);
>> >   ipenc -> ethenc::EtherEncap(0x0800, ff:ff:ff:ff:ff:ff, MyEther);
>> >   ethenc -> q2::Queue;
>> >   q2 -> [1]prio;
>> >
>> > -Vivek
>> >
>> >
>> >
>> > On 9/19/06, Vivek raghunathan <vivek.raghunathan at gmail.com> wrote:
>> >> Hi all.
>> >>
>> >> I am currently implementing a Click-based opportunistic packet
>> >> combination engine for use on top of IEEE 802.11. I've unit tested my
>> >> implementation fairly extensively in user-space, and partly
>> >> unit-tested in kernelspace, and haven't had any issues so far. I
>> >> recently moved to doing integration testing, and the code seems to run
>> >> okay in-kernel without any problems, except that every so often (maybe
>> >> 6 out of 10 times), click-uninstall causes a kernel panic in interrupt
>> >> context on cleanup.
>> >>
>> >> The panic seems to be related to my code on the tx output path; since
>> >> it only appears for a few particular configurations, and only when
>> >> some of my elements are introducted. The configuration I am using that
>> >> triggers the panic is attached.  I've also manually copied the
>> >> oops-trace from the screen, and attached it with this email. A
>> >> register dump using sysrq does not produce any additional useful info,
>> >> so I have excluded it. It seems like the panic is triggered somewhere
>> >> in ToDevice::run_task. I realize that some brain-dead bug in my code
>> >> is probably at fault, and am currently double-checking everything I've
>> >> written. I am posting here mainly because I am not sure if this is a
>> >> ToDevice bug that I am inadvertently triggering.
>> >>
>> >> Additionally, I'm having trouble getting ksymoops to run with click.
>> >> Any ideas on how I go about it? (I've also tried using kexec/kdump,
>> >> but it seems like these are very twitchy about what kernel config is
>> >> used, and have issues with the one I am using).
>> >>
>> >> Vivek
>> >>
>> >>
>> >> --
>> >>
>> >> ---
>> >>
>> >> *************************************
>> >> Vivek Raghunathan,
>> >> PhD student,
>> >> University of Illinois, Urbana-Champaign
>> >>
>> >> Contact Details:
>> >> 1012 W. Clark St #31,
>> >> Urbana IL 61801
>> >>
>> >> ph: 217-766-1868 (cell)
>> >>     217-333-7541 (off)
>> >>
>> >>
>> >>
>> >
>> >
>>
> 
> 


More information about the click mailing list