[Click] Click: Kernel oops on e1000e polling patch with jumbo frames?

Joonwoo Park joonwpark81 at gmail.com
Tue Jan 26 11:39:31 EST 2010


Hi,

The kernel oops is happening when rx_poll() is returning NULL with got
> 1 when it encounters error.
It's bug and I could fix it easily.

However I've realized that 82571 and greater support packet split and
e1000e driver is configuring registers to use that functionality.
As a result with this e1000e driver, 82571 and greater will split
packets when mtu > 1500.
Unfortunately,  I missed to implement it for polling patch so
e1000_rx_poll() is watching incorrect receive descriptor when mtu >
1500.
We need a function something like e1000_rx_poll_ps() additionally.

It shouldn't hard but I'm kind of busy so I don't know when I'm going
to be free.
I'll let you know when it's done so you can test.

Joonwoo

On Mon, Jan 18, 2010 at 2:41 AM, Nuutti Varis <nvaris at cc.hut.fi> wrote:
> On Jan 13, 2010, at 6:33 PM, Joonwoo Park wrote:
>
>> Hi Nuutti,
>>
>> On Wed, Jan 13, 2010 at 3:34 AM, Nuutti Varis <nvaris at cc.hut.fi> wrote:
>>> Hello,
>>>
>>> The e1000e polling patch for the Intel NICs (e1000e driver) seems to have issues with MTU > 1500 (we use a MTU of 1540, issues start from 1501->). From what I could gather with very brief experience in kernel hacking, the driver in polling mode gets the E1000_RXD_STAT_EOP bit set when MTU>1500, and e1000_rx_poll promptly goes through all the buffers in the ring with the code on lines 5173-5179 in netdev.c.
>>
>> That process looks correct.
>> When you say MTU size, do you mean MTU size of NIC?
>> What's the packet size when you have this problem?
>
> MTU size of the NIC. The oops itself happens before any packets are received, i.e. the kernel oopses when I do click-install foo.click.
>
>>> End result is an oops (null pointer dereference) in PollDevice::run_task, as e1000_rx_poll increments "got" before assigning the skb to skb_head.
>>>
>>> System specs follows:
>>> - Click is the latest (at the time of writing) from GIT
>>> - Linux 100g-10-x86-64 2.6.24.7-click #1 SMP Tue Jan 12 15:10:28 EET 2010 x86_64 GNU/Linux
>>> - PREEMPT_NONE=y, {PREEMPT_VOLUNTARY, PREEMPT, PREEMPT_BKL}=n
>>> - Click configured with --enable-etherswitch --enable-linuxmodule
>>> - gcc version 4.3.2 (Debian 4.3.2-1.1)
>>> - Network Interface Card chips from lspci (we have 4x ports internally and another quad port card)
>>>        * Intel Corporation 82574L Gigabit Network Connection
>>>        * Intel Corporation 82571EB Gigabit Ethernet Controller (rev 06)
>>>
>>
>> Thanks for your very detail explanation, but can you also give me your
>> click config (or simplified config that I can reproduce this would be
>> great)
>
> Simplest configuration with PollDevice I could figure out oopses:
>
> PollDevice( eth0 ) -> Discard;
> Idle -> ToDevice( eth0 );
>
>>> ==
>>> [ 1360.185658] Unable to handle kernel NULL pointer dereference at 00000000000000c0 RIP:
>>> [ 1360.185705]  [<ffffffff88195197>] :click:_ZN10PollDevice8run_taskEP4Task+0x127/0x360
>>> [ 1360.186038] PGD 0
>>> [ 1360.186171] Oops: 0002 [1] SMP
>>> [ 1360.186287] CPU 7
>>> [ 1360.186416] Modules linked in: dot1q(PF) trill(PF) click proclikefs loop button evdev e1000e pcspkr ext3 jbd mbcache sd_mod ahci libata scsi_mod ehci_hcd uhci_hcd thermal processor fan
>>> [ 1360.187224] Pid: 22129, comm: kclick Tainted: PF       2.6.24.7-click #1
>>> [ 1360.187282] RIP: 0010:[<ffffffff88195197>]  [<ffffffff88195197>] :click:_ZN10PollDevice8run_taskEP4Task+0x127/0x360
>>> [ 1360.187531] RSP: 0018:ffff8101bc015e70  EFLAGS: 00010246
>>> [ 1360.187587] RAX: 0000000000000008 RBX: 0000000000000000 RCX: ffff8101bcb6a780
>>> [ 1360.187645] RDX: 0000000000000006 RSI: ffff8101bc959040 RDI: 0000000000000000
>>> [ 1360.187704] RBP: 0000000000000000 R08: ffff8101bfc026e0 R09: 0000000000000086
>>> [ 1360.187763] R10: 00000000000a06aa R11: ffff8101bc015dd0 R12: ffff8101bc197000
>>> [ 1360.187822] R13: 0000000000000000 R14: ffff8101bbc366c0 R15: ffff8101bbc36744
>>> [ 1360.187880] FS:  00002adc4ad5e6e0(0000) GS:ffff8101bf1b87c0(0000) knlGS:0000000000000000
>>> [ 1360.187955] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
>>> [ 1360.188011] CR2: 00000000000000c0 CR3: 0000000000201000 CR4: 00000000000006e0
>>> [ 1360.188070] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>>> [ 1360.188128] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
>>> [ 1360.188187] Process kclick (pid: 22129, threadinfo ffff8101bc014000, task ffff8101be8172e0)
>>> [ 1360.188261] Stack:  0000000000000000 ffffffff88197bc2 0000000000000000 0000000800000008
>>> [ 1360.188496]  0000000000000000 ffff8101bc1970b0 0000000000000000 000000000000007d
>>> [ 1360.188697]  0000000000000000 ffff8101bbc366c0 ffff8101bbc36744 ffffffff8813caf5
>>> [ 1360.188901] Call Trace:
>>> [ 1360.189138]  [<ffffffff88197bc2>] :click:_ZN8ToDevice8run_taskEP4Task+0x112/0x580
>>> [ 1360.189338]  [<ffffffff8813caf5>] :click:_ZN12RouterThread6driverEv+0x335/0x4c0
>>> [ 1360.189555]  [<ffffffff881db8ad>] :click:_ZL11click_schedPv+0xcd/0x1d0
>>> [ 1360.189648]  [<ffffffff8020cd88>] child_rip+0xa/0x12
>>> [ 1360.189840]  [<ffffffff881db7e0>] :click:_ZL11click_schedPv+0x0/0x1d0
>>> [ 1360.189906]  [<ffffffff8020cd7e>] child_rip+0x0/0x12
>>> [ 1360.189961]
>>> [ 1360.190009]
>>> [ 1360.190009] Code: 48 83 ab c0 00 00 00 0e 83 43 68 0e 48 8b 83 b8 00 00 00 48
>>> [ 1360.190911] RIP  [<ffffffff88195197>] :click:_ZN10PollDevice8run_taskEP4Task+0x127/0x360
>>> [ 1360.191155]  RSP <ffff8101bc015e70>
>>> [ 1360.191207] CR2: 00000000000000c0
>>> [ 1360.191299] ---[ end trace f8e2fe527d7ef925 ]---
>>
>> Well. I suggest you to find where the oops is happening from click source.
>> To get that you can
>> - recompile & install kernel with CONFIG_DEBUG_INFO
>> - recompile & click again.
>> - reproduce this issue.
>> - run gdb click.ko
>> - type command 'info line *{eip}' to get oopsing source code/line
>>   eg) info line *_ZN10PollDevice8run_taskEP4Task+0x127
>
> (gdb) info line *_ZN10PollDevice8run_taskEP4Task+0x11b
> Line 945 of "/lib/modules/2.6.24.7-click/build/include/linux/skbuff.h"
>    starts at address 0x8722b <_ZN10PollDevice8run_taskEP4Task+283>
>    and ends at 0x87233 <_ZN10PollDevice8run_taskEP4Task+291>
>
> That is skb_push(), which is probably the skb_push() at line 269 in polldevice.cc?
>
> Br, Nuutti
>
>



More information about the click mailing list