[Click] Help with kernel OOPS

Vivek raghunathan vivek.raghunathan at gmail.com
Tue Sep 19 14:46:55 EDT 2006


Hi all.

I am currently implementing a Click-based opportunistic packet
combination engine for use on top of IEEE 802.11. I've unit tested my
implementation fairly extensively in user-space, and partly
unit-tested in kernelspace, and haven't had any issues so far. I
recently moved to doing integration testing, and the code seems to run
okay in-kernel without any problems, except that every so often (maybe
6 out of 10 times), click-uninstall causes a kernel panic in interrupt
context on cleanup.

The panic seems to be related to my code on the tx output path; since
it only appears for a few particular configurations, and only when
some of my elements are introducted. The configuration I am using that
triggers the panic is attached.  I've also manually copied the
oops-trace from the screen, and attached it with this email. A
register dump using sysrq does not produce any additional useful info,
so I have excluded it. It seems like the panic is triggered somewhere
in ToDevice::run_task. I realize that some brain-dead bug in my code
is probably at fault, and am currently double-checking everything I've
written. I am posting here mainly because I am not sure if this is a
ToDevice bug that I am inadvertently triggering.

Additionally, I'm having trouble getting ksymoops to run with click.
Any ideas on how I go about it? (I've also tried using kexec/kdump,
but it seems like these are very twitchy about what kernel config is
used, and have issues with the one I am using).

Vivek


-- 

---

*************************************
Vivek Raghunathan,
PhD student,
University of Illinois, Urbana-Champaign

Contact Details:
1012 W. Clark St #31,
Urbana IL 61801

ph: 217-766-1868 (cell)
    217-333-7541 (off)
-------------- next part --------------
ep 18 23:51:44 localhost kernel: [4309934.322000] scheduling while atomic: click-uninstall/0x00000001/4756
Sep 18 23:51:44 localhost kernel: [4309934.322000]  [schedule+1468/1680] schedule+0x5bc/0x690
Sep 18 23:51:44 localhost kernel: [4309934.322000]  [dev_ioctl+622/992] dev_ioctl+0x26e/0x3e0
Sep 18 23:51:44 localhost kernel: [4309934.322000]  [pg0+257585981/1053733888] journal_stop+0x17d/0x2a0 [jbd]
Sep 18 23:51:44 localhost kernel: [4309934.322000]  [wait_for_completion+136/224] wait_for_completion+0x88/0xe0
Sep 18 23:51:44 localhost kernel: [4309934.322000]  [default_wake_function+0/32] default_wake_function+0x0/0x20
Sep 18 23:51:44 localhost kernel: [4309934.323000]  [synchronize_rcu+52/64] synchronize_rcu+0x34/0x40
Sep 18 23:51:44 localhost kernel: [4309934.323000]  [wakeme_after_rcu+0/16] wakeme_after_rcu+0x0/0x10
Sep 18 23:51:44 localhost kernel: [4309934.323000]  [unregister_netdevice+247/576] unregister_netdevice+0xf7/0x240
Sep 18 23:51:44 localhost kernel: [4309934.323000]  [unregister_netdev+16/32] unregister_netdev+0x10/0x20
Sep 18 23:51:44 localhost kernel: [4309934.323000]  [pg0+268235567/1053733888] _ZN8FromHost7cleanupEN7Element12CleanupStageE+0x9f/0xd0 [click]
Sep 18 23:51:44 localhost kernel: [4309934.323000]  [pg0+268023399/1053733888] _ZN6RouterD1Ev+0x427/0x460 [click]
Sep 18 23:51:44 localhost kernel: [4309934.323000]  [pg0+268023481/1053733888] _ZN6Router5unuseEv+0x19/0x40 [click]
Sep 18 23:51:44 localhost kernel: [4309934.323000]  [pg0+268413973/1053733888] _Z11kill_routerv+0x15/0x30 [click]
Sep 18 23:51:44 localhost kernel: [4309934.324000]  [pg0+268415537/1053733888] _Z12write_configRK6StringP7ElementPvP12ErrorHandler+0x21/0x1d0 [click]
Sep 18 23:51:44 localhost kernel: [4309934.324000]  [pg0+267998862/1053733888] _ZNK7Handler10call_writeERK6StringP7ElementbP12ErrorHandler+0x16e/0x1e0 [click]
Sep 18 23:51:44 localhost kernel: [4309934.324000]  [pg0+268429044/1053733888] handler_flush+0x4f4/0x500 [click]
Sep 18 23:51:44 localhost kernel: [4309934.325000]  [filp_close+35/128] filp_close+0x23/0x80
Sep 18 23:51:44 localhost kernel: [4309934.325000]  [sys_close+92/160] sys_close+0x5c/0xa0
Sep 18 23:51:44 localhost kernel: [4309934.325000]  [sysenter_past_esp+84/121] sysenter_past_esp+0x54/0x79
Sep 18 23:51:44 localhost kernel: [4309934.327000] scheduling while atomic: click-uninstall/0x00000001/4756
Sep 18 23:51:44 localhost kernel: [4309934.327000]  [schedule+1468/1680] schedule+0x5bc/0x690
Sep 18 23:51:44 localhost kernel: [4309934.327000]  [extract_entropy+124/176] extract_entropy+0x7c/0xb0
Sep 18 23:51:44 localhost kernel: [4309934.327000]  [pneigh_queue_purge+47/80] pneigh_queue_purge+0x2f/0x50
Sep 18 23:51:44 localhost kernel: [4309934.327000]  [neigh_ifdown+139/208] neigh_ifdown+0x8b/0xd0
Sep 18 23:51:44 localhost kernel: [4309934.327000]  [pneigh_queue_purge+47/80] pneigh_queue_purge+0x2f/0x50
Sep 18 23:51:44 localhost kernel: [4309934.327000]  [wait_for_completion+136/224] wait_for_completion+0x88/0xe0
Sep 18 23:51:44 localhost kernel: [4309934.327000]  [default_wake_function+0/32] default_wake_function+0x0/0x20
Sep 18 23:51:44 localhost kernel: [4309934.327000]  [synchronize_rcu+52/64] synchronize_rcu+0x34/0x40
Sep 18 23:51:44 localhost kernel: [4309934.327000]  [wakeme_after_rcu+0/16] wakeme_after_rcu+0x0/0x10
Sep 18 23:51:44 localhost kernel: [4309934.327000]  [unregister_netdevice+362/576] unregister_netdevice+0x16a/0x240
Sep 18 23:51:44 localhost kernel: [4309934.327000]  [unregister_netdev+16/32] unregister_netdev+0x10/0x20
Sep 18 23:51:44 localhost kernel: [4309934.327000]  [pg0+268235567/1053733888] _ZN8FromHost7cleanupEN7Element12CleanupStageE+0x9f/0xd0 [click]
Sep 18 23:51:44 localhost kernel: [4309934.327000]  [pg0+268023399/1053733888] _ZN6RouterD1Ev+0x427/0x460 [click]
Sep 18 23:51:44 localhost kernel: [4309934.328000]  [pg0+268023481/1053733888] _ZN6Router5unuseEv+0x19/0x40 [click]
Sep 18 23:51:44 localhost kernel: [4309934.328000]  [pg0+268413973/1053733888] _Z11kill_routerv+0x15/0x30 [click]
Sep 18 23:51:44 localhost kernel: [4309934.328000]  [pg0+268415537/1053733888] _Z12write_configRK6StringP7ElementPvP12ErrorHandler+0x21/0x1d0 [click]
ZNK7Handler10call_writeERK6StringP7ElementbP12ErrorHandler+0x16e/0x1e0 [click]
Sep 18 23:51:44 localhost kernel: [4309934.329000]  [pg0+268429044/1053733888] handler_flush+0x4f4/0x500 [click]
Sep 18 23:51:44 localhost kernel: [4309934.329000]  [filp_close+35/128] filp_close+0x23/0x80
Sep 18 23:51:44 localhost kernel: [4309934.329000]  [sys_close+92/160] sys_close+0x5c/0xa0
Sep 18 23:51:44 localhost kernel: [4309934.329000]  [sysenter_past_esp+84/121] sysenter_past_esp+0x54/0x79
Sep 18 23:51:44 localhost kernel: [4309934.344000] Unable to handle kernel NULL pointer dereference at virtual address 00000020
Sep 18 23:51:44 localhost kernel: [4309934.344000]  printing eip:
Sep 18 23:51:44 localhost kernel: [4309934.344000] d12e5ac6
Sep 18 23:51:44 localhost kernel: [4309934.344000] *pde = 00000000
Sep 18 23:51:44 localhost kernel: [4309934.344000] Oops: 0000 [#1]
Sep 18 23:51:44 localhost kernel: [4309934.344000] PREEMPT
Sep 18 23:51:44 localhost kernel: [4309934.344000] Modules linked in: click proclikefs rfcomm l2cap bluetooth nvram uinput ppdev radeon drm speedstep_centrino cpufreq_userspace cpufreq_stats freq_table cpufreq_powersave cpufreq_ondemand cpufreq_conservative video ibm_acpi container button battery ac ipv6 dm_mod af_packet md_mod lp pcmcia e100 joydev ipw2200 snd_intel8x0 tsdev mii ieee80211 ieee80211_crypt yenta_socket rsrc_nonstatic pcmcia_core snd_ac97_codec snd_ac97_bus ide_cd cdrom snd_pcm_oss snd_mixer_oss parport_pc parport floppy psmouse serio_raw snd_pcm rtc ehci_hcd hw_random uhci_hcd intel_agp agpgart pcspkr usbcore snd_timer shpchp pci_hotplug snd soundcore snd_page_alloc evdev ext3 jbd mbcache ide_disk ide_generic via82cxxx trm290 triflex slc90e66 sis5513 siimage serverworks sc1200 rz1000 piix pdc202xx_old pdc202xx_new opti621 ns87415 it821x hpt366 hpt34x generic cy82c693 cs5535 cs5530 cs5520 cmd64x atiixp amd74xx alim15x3 aec62xx thermal processor fan
Sep 18 23:51:44 localhost kernel: [4309934.344000] CPU:    0
Sep 18 23:51:44 localhost kernel: [4309934.344000] EIP:    0060:[pg0+268249798/1053733888]    Not tainted VLI
Sep 18 23:51:44 localhost kernel: [4309934.344000] EFLAGS: 00210246   (2.6.16.13 #2)
Sep 18 23:51:44 localhost kernel: [4309934.344000] EIP is at _ZN8ToDevice8run_taskEP4Task+0x356/0x3d0 [click]
Sep 18 23:51:44 localhost kernel: [4309934.344000] eax: 00000020   ebx: c87ce8d4   ecx: 000059ea   edx: 00004c2c
Sep 18 23:51:44 localhost kernel: [4309934.344000] esi: c87ce880   edi: 00000000   ebp: 00000000   esp: c207ff8c
Sep 18 23:51:44 localhost kernel: [4309934.344000] ds: 007b   es: 007b   ss: 0068
Sep 18 23:51:44 localhost kernel: [4309934.344000] Process kclick (pid: 4377, threadinfo=c207e000 task=c2fc2030)
Sep 18 23:51:44 localhost kernel: [4309934.344000] Stack: <0>007ce8d4 00000014 00000080 c272aec0 000d1ad5 d12a7f16 c87ce880 c87ce8d4
Sep 18 23:51:44 localhost kernel: [4309934.344000]        c272af5c 00000010 00000020 d13190dd 00000010 c207e000 c2fc2030 c272aec0
Sep 18 23:51:44 localhost kernel: [4309934.344000]        00000000 d130e4e8 c272aec0 d130e460 00000000 00000000 00000000 c1001005
Sep 18 23:51:44 localhost kernel: [4309934.344000] Call Trace:
Sep 18 23:51:44 localhost kernel: [4309934.344000]  [pg0+267996950/1053733888] _ZN12RouterThread6driverEv+0x146/0x2e0 [click]
Sep 18 23:51:44 localhost kernel: [4309934.344000]  [pg0+268460253/1053733888] _ZN6VectorIiE7reserveEi+0x2d/0x90 [click]
Sep 18 23:51:44 localhost kernel: [4309934.344000]  [pg0+268416232/1053733888] _Z11click_schedPv+0x88/0x170 [click]
Sep 18 23:51:44 localhost kernel: [4309934.344000]  [pg0+268416096/1053733888] _Z11click_schedPv+0x0/0x170 [click]
Sep 18 23:51:44 localhost kernel: [4309934.344000]  [kernel_thread_helper+5/16] kernel_thread_helper+0x5/0x10
Sep 18 23:51:44 localhost kernel: [4309934.344000] Code: c8 85 c0 7e f0 eb aa 8d b4 26 00 00 00 00 8d bc 27 00 00 00 00 ba 01 00 00 00 b9 01 00 00 00 e9 a2 fd ff ff 90 8b 86 b0 00 00 00 <8b> 00 85 86 b4 00 00 00 0f 85 a8 fd ff ff 31 d2 e9 a6 fd ff ff
-------------- next part --------------

AddressInfo(MyIP 10.1.1.2/8); 
AddressInfo(MyEther 00:11:25:2D:7D:33) 
// Remember to change masks like cls_broad whenever BroadcastAddr changes
AddressInfo(BroadcastAddr 10.255.255.255); 
// AddressInfo(RemoteIP 10.1.1.1/8);
// AddressInfo(RemoteEther 00:11:25:47:EA:7B) 

// Combiner for fak0 
// SplayCombiner = 222 = 0xde 
frmhst::FromHost(fak0, MyIP, ETHER MyEther) -> cls_arp::Classifier(12/0806, 12/0800, -); 
cls_arp[0] -> passq::Queue;   
cls_arp[1] -> cls_broad::Classifier(16/0affffff, -); 
cls_arp[2] -> passq; 
cls_broad[0] -> passq; 
cls_broad[1] -> combq::PureQueue;   

passq -> [0]pr_sch::PrioSched; 

ruledb::RuleDB;      
nbr::NbrTable; 
comb::Combiner(RULEDB ruledb, PUREQUEUE combq, NBRTABLE nbr);  

comb->ipenc::IPEncap(0xde, MyIP, BroadcastAddr);  
ipenc->ethenc::SplayEtherEncapDstFix(MyEther);     
ethenc->[1]pr_sch; 

pr_sch -> Print(testtx) -> ToDevice(eth0);  

////////////////////////////////////

FromDevice(eth0) -> Print(test_rx, 100) -> SetPacketType(HOST) -> ToHost(fak0); 

Comments on what the elements are
1. PureQueue is a 1/0 variant of SimpleQueue that provides a push based interface for queueing, and allows for dequeuing from an arbitrary position inside the queue using a public method. This is comprehensively unit tested 
2. NbrTable and RuleDB are database elements that only store data and provide an external access interface to add/delete entries. These are comprehensively unit tested.
3. Combiner uses NbrTable and RuleDB to decide which packets from PureQueue to extract when its downstream initiates a pull. This variant is a base class that does nothing; it ingores Nbrtable, RuleDB and simply extracts packets out of the PureQueue in FIFO order.
4. SplayEtherEncapDstFix copies Ethernet destination address from the annotation field. This annotation is set by Combiner. 


More information about the click mailing list