[Click] Click: Kernel oops on e1000e polling patch with jumbo frames?

Nuutti Varis nvaris at cc.hut.fi
Mon Jan 18 05:41:36 EST 2010


On Jan 13, 2010, at 6:33 PM, Joonwoo Park wrote:

> Hi Nuutti,
> 
> On Wed, Jan 13, 2010 at 3:34 AM, Nuutti Varis <nvaris at cc.hut.fi> wrote:
>> Hello,
>> 
>> The e1000e polling patch for the Intel NICs (e1000e driver) seems to have issues with MTU > 1500 (we use a MTU of 1540, issues start from 1501->). From what I could gather with very brief experience in kernel hacking, the driver in polling mode gets the E1000_RXD_STAT_EOP bit set when MTU>1500, and e1000_rx_poll promptly goes through all the buffers in the ring with the code on lines 5173-5179 in netdev.c.
> 
> That process looks correct.
> When you say MTU size, do you mean MTU size of NIC?
> What's the packet size when you have this problem?

MTU size of the NIC. The oops itself happens before any packets are received, i.e. the kernel oopses when I do click-install foo.click.

>> End result is an oops (null pointer dereference) in PollDevice::run_task, as e1000_rx_poll increments "got" before assigning the skb to skb_head.
>> 
>> System specs follows:
>> - Click is the latest (at the time of writing) from GIT
>> - Linux 100g-10-x86-64 2.6.24.7-click #1 SMP Tue Jan 12 15:10:28 EET 2010 x86_64 GNU/Linux
>> - PREEMPT_NONE=y, {PREEMPT_VOLUNTARY, PREEMPT, PREEMPT_BKL}=n
>> - Click configured with --enable-etherswitch --enable-linuxmodule
>> - gcc version 4.3.2 (Debian 4.3.2-1.1)
>> - Network Interface Card chips from lspci (we have 4x ports internally and another quad port card)
>>        * Intel Corporation 82574L Gigabit Network Connection
>>        * Intel Corporation 82571EB Gigabit Ethernet Controller (rev 06)
>> 
> 
> Thanks for your very detail explanation, but can you also give me your
> click config (or simplified config that I can reproduce this would be
> great)

Simplest configuration with PollDevice I could figure out oopses:

PollDevice( eth0 ) -> Discard;
Idle -> ToDevice( eth0 );

>> ==
>> [ 1360.185658] Unable to handle kernel NULL pointer dereference at 00000000000000c0 RIP:
>> [ 1360.185705]  [<ffffffff88195197>] :click:_ZN10PollDevice8run_taskEP4Task+0x127/0x360
>> [ 1360.186038] PGD 0
>> [ 1360.186171] Oops: 0002 [1] SMP
>> [ 1360.186287] CPU 7
>> [ 1360.186416] Modules linked in: dot1q(PF) trill(PF) click proclikefs loop button evdev e1000e pcspkr ext3 jbd mbcache sd_mod ahci libata scsi_mod ehci_hcd uhci_hcd thermal processor fan
>> [ 1360.187224] Pid: 22129, comm: kclick Tainted: PF       2.6.24.7-click #1
>> [ 1360.187282] RIP: 0010:[<ffffffff88195197>]  [<ffffffff88195197>] :click:_ZN10PollDevice8run_taskEP4Task+0x127/0x360
>> [ 1360.187531] RSP: 0018:ffff8101bc015e70  EFLAGS: 00010246
>> [ 1360.187587] RAX: 0000000000000008 RBX: 0000000000000000 RCX: ffff8101bcb6a780
>> [ 1360.187645] RDX: 0000000000000006 RSI: ffff8101bc959040 RDI: 0000000000000000
>> [ 1360.187704] RBP: 0000000000000000 R08: ffff8101bfc026e0 R09: 0000000000000086
>> [ 1360.187763] R10: 00000000000a06aa R11: ffff8101bc015dd0 R12: ffff8101bc197000
>> [ 1360.187822] R13: 0000000000000000 R14: ffff8101bbc366c0 R15: ffff8101bbc36744
>> [ 1360.187880] FS:  00002adc4ad5e6e0(0000) GS:ffff8101bf1b87c0(0000) knlGS:0000000000000000
>> [ 1360.187955] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
>> [ 1360.188011] CR2: 00000000000000c0 CR3: 0000000000201000 CR4: 00000000000006e0
>> [ 1360.188070] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>> [ 1360.188128] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
>> [ 1360.188187] Process kclick (pid: 22129, threadinfo ffff8101bc014000, task ffff8101be8172e0)
>> [ 1360.188261] Stack:  0000000000000000 ffffffff88197bc2 0000000000000000 0000000800000008
>> [ 1360.188496]  0000000000000000 ffff8101bc1970b0 0000000000000000 000000000000007d
>> [ 1360.188697]  0000000000000000 ffff8101bbc366c0 ffff8101bbc36744 ffffffff8813caf5
>> [ 1360.188901] Call Trace:
>> [ 1360.189138]  [<ffffffff88197bc2>] :click:_ZN8ToDevice8run_taskEP4Task+0x112/0x580
>> [ 1360.189338]  [<ffffffff8813caf5>] :click:_ZN12RouterThread6driverEv+0x335/0x4c0
>> [ 1360.189555]  [<ffffffff881db8ad>] :click:_ZL11click_schedPv+0xcd/0x1d0
>> [ 1360.189648]  [<ffffffff8020cd88>] child_rip+0xa/0x12
>> [ 1360.189840]  [<ffffffff881db7e0>] :click:_ZL11click_schedPv+0x0/0x1d0
>> [ 1360.189906]  [<ffffffff8020cd7e>] child_rip+0x0/0x12
>> [ 1360.189961]
>> [ 1360.190009]
>> [ 1360.190009] Code: 48 83 ab c0 00 00 00 0e 83 43 68 0e 48 8b 83 b8 00 00 00 48
>> [ 1360.190911] RIP  [<ffffffff88195197>] :click:_ZN10PollDevice8run_taskEP4Task+0x127/0x360
>> [ 1360.191155]  RSP <ffff8101bc015e70>
>> [ 1360.191207] CR2: 00000000000000c0
>> [ 1360.191299] ---[ end trace f8e2fe527d7ef925 ]---
> 
> Well. I suggest you to find where the oops is happening from click source.
> To get that you can
> - recompile & install kernel with CONFIG_DEBUG_INFO
> - recompile & click again.
> - reproduce this issue.
> - run gdb click.ko
> - type command 'info line *{eip}' to get oopsing source code/line
>   eg) info line *_ZN10PollDevice8run_taskEP4Task+0x127

(gdb) info line *_ZN10PollDevice8run_taskEP4Task+0x11b
Line 945 of "/lib/modules/2.6.24.7-click/build/include/linux/skbuff.h"
    starts at address 0x8722b <_ZN10PollDevice8run_taskEP4Task+283>
    and ends at 0x87233 <_ZN10PollDevice8run_taskEP4Task+291>

That is skb_push(), which is probably the skb_push() at line 269 in polldevice.cc?

Br, Nuutti




More information about the click mailing list