[Click] Help with kernel OOPS

Vivek raghunathan vivek.raghunathan at gmail.com
Wed Sep 27 15:45:06 EDT 2006


Eddie,

To answer your previous questions, Master::kill_router is calling
RouterThread::unschedule_router_tasks.  I'll recompile without
CONFIG_PREEMPT and see if the oops disappear.

Vivek


On 9/27/06, Eddie Kohler <kohler at cs.ucla.edu> wrote:
> Your oops reports PREEMPT.  I assume this means your kernel has
> CONFIG_PREEMPT.  Can you try recompiling your kernel without CONFIG_PREEMPT?
> My kernel has CONFIG_PREEMPT_VOLUNTARY,  but not CONFIG_PREEMPT.  I have run
> the following bash script, which is like your config minus ToHost, with no crash.
>
>
> x="splsrc::InfiniteSource(DATA This_is_a_test_by_Eddie_Kohler, LIMIT -1, STOP
> true);
> splsrc -> ipenc::IPEncap(222, 131.179.33.137, 131.179.232.51);
> ipenc -> ethenc::EtherEncap(0x0800, ff:ff:ff:ff:ff:ff, ath0);
> ethenc -> q2::Queue;
> q2 -> ToDevice(ath0);"
>
> times=0
> while (($times < 100)); do
>         click-install -e "$x"
>         click-uninstall
>         times=$(($times + 1))
> done
>
>
> Eddie
>
>
> Vivek raghunathan wrote:
> > Eddie,
> >
> > Here's a script without FromHost that generates an oops. This oops
> > doesn't hang the machine though ...
> >
> > AddressInfo(MyEther 00:11:25:2D:7D:33, RemoteEther 00:11:25:47:EA:7B,
> >          MyIP 10.1.1.2/8, RemoteIP 10.1.1.1/8, BroadcastAddr
> > 10.255.255.255);
> >
> > FromDevice(eth0) -> SetPacketType(HOST) -> ToHost(eth0);
> >
> > splsrc::InfiniteSource(
> > DATA \<aa bb cc dd ee ff>, LIMIT -1, STOP true);
> > splsrc -> ipenc::IPEncap(222, MyIP, BroadcastAddr);
> > ipenc -> ethenc::EtherEncap(0x0800, ff:ff:ff:ff:ff:ff, MyEther);
> > ethenc -> q2::Queue;
> > q2 -> ToDevice(eth0);
> >
> > -Vivek
> >
> >
> >
> > Sep 27 13:46:28 localhost kernel: [4294861.518000] Unable to handle
> > kernel NULL pointer dereference at virtual address 00000000
> > Sep 27 13:46:28 localhost kernel: [4294861.518000]  printing eip:
> > Sep 27 13:46:28 localhost kernel: [4294861.518000] d124f881
> > Sep 27 13:46:28 localhost kernel: [4294861.518000] *pde = 00000000
> > Sep 27 13:46:28 localhost kernel: [4294861.518000] Oops: 0000 [#1]
> > Sep 27 13:46:28 localhost kernel: [4294861.518000] PREEMPT
> > Sep 27 13:46:28 localhost kernel: [4294861.518000] Modules linked in:
> > click proclikefs rfcomm l2cap bluetooth nvram uinput ppdev radeon drm
> > speedstep_centrino cpufreq_userspace cpufreq_stats freq_table
> > cpufreq_powersave cpufreq_ondemand cpufreq_conservative video ibm_acpi
> > container button battery ac ipv6 dm_mod md_mod lp af_packet airo_cs
> > airo pcmcia joydev tsdev e100 ipw2200 mii ide_cd cdrom ieee80211
> > ieee80211_crypt yenta_socket rsrc_nonstatic pcmcia_core snd_intel8x0
> > snd_ac97_codec snd_ac97_bus snd_pcm_oss snd_mixer_oss snd_pcm
> > snd_timer hw_random psmouse snd soundcore parport_pc parport ehci_hcd
> > uhci_hcd shpchp pci_hotplug usbcore serio_raw pcspkr floppy
> > snd_page_alloc rtc intel_agp agpgart evdev ext3 jbd mbcache ide_disk
> > ide_generic via82cxxx trm290 triflex slc90e66 sis5513 siimage
> > serverworks sc1200 rz1000 piix pdc202xx_old pdc202xx_new opti621
> > ns87415 it821x hpt366 hpt34x generic cy82c693 cs5535 cs5530 cs5520
> > cmd64x atiixp amd74xx alim15x3 aec62xx thermal processor fan
> > Sep 27 13:46:28 localhost kernel: [4294861.518000] CPU:    0
> > Sep 27 13:46:28 localhost kernel: [4294861.518000] EIP:
> > 0060:[pg0+267880577/1053979648]    Not tainted VLI
> > Sep 27 13:46:28 localhost kernel: [4294861.518000] EFLAGS: 00010282
> > (2.6.16.13 #6)
> > Sep 27 13:46:28 localhost kernel: [4294861.518000] EIP is at
> > _ZN7Element4pushEiP6Packet+0x1d/0x3c [click]
> > Sep 27 13:46:28 localhost kernel: [4294861.518000] eax: c6c21a94
> > ebx: c6c21a80   ecx: d124f864   edx: 00000000
> > Sep 27 13:46:28 localhost kernel: [4294861.518000] esi: cb93ed40
> > edi: 00000000   ebp: 00000001   esp: cfd63f70
> > Sep 27 13:46:28 localhost kernel: [4294861.518000] ds: 007b   es: 007b
> >  ss: 0068
> > Sep 27 13:46:28 localhost kernel: [4294861.518000] Process kclick
> > (pid: 4057, threadinfo=cfd62000 task=c596f560)
> > Sep 27 13:46:28 localhost kernel: [4294861.518000] Stack: <0>cb93ed40
> > c6c21980 d12a4f82 c6c21a80 00000000 cb93ed40 cb93ed4c cc20ed80
> > Sep 27 13:46:28 localhost kernel: [4294861.518000]        00000001
> > 00000080 cf6704c0 0003e504 d12646c9 c6c21980 c6c219ec cf670c5c
> > Sep 27 13:46:28 localhost kernel: [4294861.518000]        00000010
> > 00000020 d12c3061 00000010 ccbd7e00 c596f560 cf6704c0 cfd62000
> > Sep 27 13:46:28 localhost kernel: [4294861.518000] Call Trace:
> > Sep 27 13:46:28 localhost kernel: [4294861.518000]
> > [pg0+268230530/1053979648]
> > _ZN14InfiniteSource8run_taskEP4Task+0xb6/0x12c [click]
> > Sep 27 13:46:28 localhost kernel: [4294861.518000]
> > [pg0+267966153/1053979648] _ZN12RouterThread6driverEv+0x12d/0x2a0
> > [click]
> > Sep 27 13:46:28 localhost kernel: [4294861.518000]
> > [pg0+268353633/1053979648] _ZN6VectorIiE7reserveEi+0x2d/0x8c [click]
> > Sep 27 13:46:28 localhost kernel: [4294861.518000]
> > [pg0+268314274/1053979648] _Z11click_schedPv+0x8e/0x164 [click]
> > Sep 27 13:46:28 localhost kernel: [4294861.518000]
> > [pg0+268314132/1053979648] _Z11click_schedPv+0x0/0x164 [click]
> > Sep 27 13:46:28 localhost kernel: [4294861.518000]
> > [kernel_thread_helper+5/12] kernel_thread_helper+0x5/0xc
> > Sep 27 13:46:28 localhost kernel: [4294861.518000] Code: c0 5b c3 8d
> > 76 00 b8 ff ff ff ff 5b c3 90 56 53 8b 5c 24 0c 8b 03 ff 74 24 14 53
> > ff 50 10 89 c6 58 5a 85 f6 74 20 8b 43 08 8b 10 <8b> 0a 89 74 24 14 8b
> > 40 04 89 44 24 10 89 54 24 0c 8b 49 08 5b
> > Sep 27 13:46:33 localhost kernel: [4294861.518000]  <1>click: current
> > router threads refuse to die!
> > Sep 27 13:46:33 localhost kernel: [4294866.502000] click: Following
> > threads still active, expect a crash:
> >
> >
> >
> > On 9/27/06, Eddie Kohler <kohler at cs.ucla.edu> wrote:
> >> Another question: Can you make the oops happen in a configuration without
> >> FromHost?
> >>
> >> FromHost installs a new networking device in the kernel.  When
> >> FromHost is
> >> cleaned up, this networking device is unregistered.  It looks like
> >> Linux wants
> >> to schedule() during the process of unregistering the network device.
> >> Click
> >> does not want Linux to schedule().  This is the "scheduling while
> >> atomic" message.
> >>
> >> The thing that's weird is that ToDevice should already have been
> >> removed from
> >> the scheduling list, even before the "scheudling while atomic" message.
> >>
> >> Eddie
> >>
> >>
> >> Vivek raghunathan wrote:
> >> > All,
> >> >
> >> > The bug I reported is not specific to my code, and is probably a
> >> > ToDevice race condition that I was inadvertently triggering. Using the
> >> > following configuration generates the same kernel oops with EIP at
> >> > ToDevice::run_task() in interrupt context. (My Ethernet NIC is a Intel
> >> > Pro/100 using the e100 driver.)
> >> >
> >> >   AddressInfo(MyEther 00:11:25:2D:7D:33, RemoteEther 00:11:25:47:EA:7B,
> >> >             MyIP 10.1.1.2/8, RemoteIP 10.1.1.1/8, BroadcastAddr
> >> 10.255.255.255);
> >> >
> >> >   FromHost(fak0, MyIP, ETHER MyEther) -> q1::Queue
> >> >   q1 -> [0]prio::PrioSched -> Print(test_tx, 100) -> ToDevice(eth0);
> >> >   FromDevice(eth0) -> Print(test_rx, 100) -> SetPacketType(HOST) ->
> >> > ToHost(fak0);
> >> >
> >> >   splsrc::InfiniteSource(
> >> >   DATA \<aa bb cc dd ee ff>, LIMIT -1, STOP true);
> >> >   splsrc -> ipenc::IPEncap(222, MyIP, BroadcastAddr);
> >> >   ipenc -> ethenc::EtherEncap(0x0800, ff:ff:ff:ff:ff:ff, MyEther);
> >> >   ethenc -> q2::Queue;
> >> >   q2 -> [1]prio;
> >> >
> >> > -Vivek
> >> >
> >> >
> >> >
> >> > On 9/19/06, Vivek raghunathan <vivek.raghunathan at gmail.com> wrote:
> >> >> Hi all.
> >> >>
> >> >> I am currently implementing a Click-based opportunistic packet
> >> >> combination engine for use on top of IEEE 802.11. I've unit tested my
> >> >> implementation fairly extensively in user-space, and partly
> >> >> unit-tested in kernelspace, and haven't had any issues so far. I
> >> >> recently moved to doing integration testing, and the code seems to run
> >> >> okay in-kernel without any problems, except that every so often (maybe
> >> >> 6 out of 10 times), click-uninstall causes a kernel panic in interrupt
> >> >> context on cleanup.
> >> >>
> >> >> The panic seems to be related to my code on the tx output path; since
> >> >> it only appears for a few particular configurations, and only when
> >> >> some of my elements are introducted. The configuration I am using that
> >> >> triggers the panic is attached.  I've also manually copied the
> >> >> oops-trace from the screen, and attached it with this email. A
> >> >> register dump using sysrq does not produce any additional useful info,
> >> >> so I have excluded it. It seems like the panic is triggered somewhere
> >> >> in ToDevice::run_task. I realize that some brain-dead bug in my code
> >> >> is probably at fault, and am currently double-checking everything I've
> >> >> written. I am posting here mainly because I am not sure if this is a
> >> >> ToDevice bug that I am inadvertently triggering.
> >> >>
> >> >> Additionally, I'm having trouble getting ksymoops to run with click.
> >> >> Any ideas on how I go about it? (I've also tried using kexec/kdump,
> >> >> but it seems like these are very twitchy about what kernel config is
> >> >> used, and have issues with the one I am using).
> >> >>
> >> >> Vivek
> >> >>
> >> >>
> >> >> --
> >> >>
> >> >> ---
> >> >>
> >> >> *************************************
> >> >> Vivek Raghunathan,
> >> >> PhD student,
> >> >> University of Illinois, Urbana-Champaign
> >> >>
> >> >> Contact Details:
> >> >> 1012 W. Clark St #31,
> >> >> Urbana IL 61801
> >> >>
> >> >> ph: 217-766-1868 (cell)
> >> >>     217-333-7541 (off)
> >> >>
> >> >>
> >> >>
> >> >
> >> >
> >>
> >
> >
>


-- 

---

*************************************
Vivek Raghunathan,
PhD student,
University of Illinois, Urbana-Champaign

Contact Details:
1012 W. Clark St #31,
Urbana IL 61801

ph: 217-766-1868 (cell)
    217-333-7541 (off)


More information about the click mailing list