[Click] driver crash
Eddie Kohler
kohler at CS.UCLA.EDU
Thu Jul 28 18:08:45 EDT 2005
Hi Roman,
?? So, hmm. I don't think anything has changed in the e1000-5.x
driver over the last two weeks. Are you saying that current CVS is
bad? That 1.4.3 is bad?
Or that everything now appears to work?>
Eddie
On Jul 28, 2005, at 2:43 PM, rchertov at purdue.edu wrote:
> Quoting rchertov at purdue.edu:
>
> When I reinstalled a fresh 2.4.26 kernel and 1.4.3 Click (with patched
> recycle()) with a e1000-5.x driver from a CVS snapshot from two
> weeks ago, the
> driver crashes have stopped.
>
> Roman
>
>
>
>> Quoting Eddie Kohler <kohler at cs.ucla.edu>:
>>
>>
>>> I bet this is the problem Qinghua Ye recently reported. When he
>>> sends the
>>> patch
>>> to the list, please apply it & see if that helps!
>>>
>>
>> I have applied the patch to the 1.4.3 version. And installed the
>> latest
>> driver
>> from CVS. I get a variety of errors. Sometimes the network
>> driver crashes
>> and
>> there is no network connectivity. Other times the whole system
>> goes down.
>>
>> Warning: kfree_skb passed an skb still on a list (from f88291dc).
>> kernel BUG at skbuff.c:316!
>> invalid operand: 0000
>> CPU: 1
>> EIP: 0010:[<c0244df4>] Not tainted
>> EFLAGS: 00010282
>> eax: 00000045 ebx: f7049440 ecx: 00000092 edx: f7773f7c
>> esi: 000000df edi: f7049440 ebp: f683de48 esp: f683ddec
>> ds: 0018 es: 0018 ss: 0018
>> Process kclick (pid: 974, stackpage=f683d000)
>> Stack: c02f0540 f88291dc f705e6e0 368400da f88291dc f7049440 00000000
>> 00000014
>> f705e6e0 00000246 00000283 f67f1000 00000009 f700ace0 0000000e
>> f7707250
>> 000000df f88f1158 f7707000 f7707234 f77070c4 f77071a0 00000040
>> f683de78
>> Call Trace: [<f88291dc>] [<f88291dc>] [<f8828d9f>] [<c024a066>]
>> [<c0125d19>]
>> [<c010af19>] [<c010da78>] [<c010f7c7>] [<f89b2ab0>] [<f89de0a2>]
>> [<f89746f4>]
>> [<f89d2738>] [<c010752e>] [<f89d26b4>]
>>
>> Code: 0f 0b 3c 01 ab f2 2e c0 8b 5c 24 14 e9 ae fe ff ff 8d 74 26
>> <0>Kernel panic: Aiee, killing interrupt handler!
>> In interrupt handler - not syncing
>> Warning: kfree_skb passed an skb still on a list (from f88291dc).
>> kernel BUG at skbuff.c:316!
>> invalid operand: 0000
>> CPU: 1
>> EIP: 0010:[<c0244df4>] Not tainted
>> Using defaults from ksymoops -t elf32-i386 -a i386
>> EFLAGS: 00010282
>> eax: 00000045 ebx: f7049440 ecx: 00000092 edx: f7773f7c
>> esi: 000000df edi: f7049440 ebp: f683de48 esp: f683ddec
>> ds: 0018 es: 0018 ss: 0018
>> Process kclick (pid: 974, stackpage=f683d000)
>> Stack: c02f0540 f88291dc f705e6e0 368400da f88291dc f7049440 00000000
>> 00000014
>> f705e6e0 00000246 00000283 f67f1000 00000009 f700ace0 0000000e
>> f7707250
>> 000000df f88f1158 f7707000 f7707234 f77070c4 f77071a0 00000040
>> f683de78
>> Call Trace: [<f88291dc>] [<f88291dc>] [<f8828d9f>] [<c024a066>]
>> [<c0125d19>]
>> [<c010af19>] [<c010da78>] [<c010f7c7>] [<f89b2ab0>] [<f89de0a2>]
>> [<f89746f4>]
>> [<f89d2738>] [<c010752e>] [<f89d26b4>]
>> Code: 0f 0b 3c 01 ab f2 2e c0 8b 5c 24 14 e9 ae fe ff ff 8d 74 26
>>
>>
>>>> EIP; c0244df4 <__kfree_skb+154/170> <=====
>>>>
>> Trace; f88291dc <[e1000]e1000_clean_tx_irq+38c/394>
>> Trace; f88291dc <[e1000]e1000_clean_tx_irq+38c/394>
>> Trace; f8828d9f <[e1000]e1000_clean+33/e4>
>> Trace; c024a066 <net_rx_action+a6/160>
>> Trace; c0125d19 <do_softirq+d9/e0>
>> Trace; c010af19 <do_IRQ+f9/120>
>> Trace; c010da78 <call_do_IRQ+5/d>
>> Trace; c010f7c7 <do_gettimeofday+57/80>
>> Trace; f89b2ab0 <END_OF_CODE+5ef99/????>
>> Trace; f89de0a2 <END_OF_CODE+8a58b/????>
>> Trace; f89746f4 <END_OF_CODE+20bdd/????>
>> Trace; f89d2738 <END_OF_CODE+7ec21/????>
>> Trace; c010752e <arch_kernel_thread+2e/40>
>> Trace; f89d26b4 <END_OF_CODE+7eb9d/????>
>> Code; c0244df4 <__kfree_skb+154/170>
>> 00000000 <_EIP>:
>> Code; c0244df4 <__kfree_skb+154/170> <=====
>> 0: 0f 0b ud2a <=====
>> Code; c0244df6 <__kfree_skb+156/170>
>> 2: 3c 01 cmp $0x1,%al
>> Code; c0244df8 <__kfree_skb+158/170>
>> 4: ab stos %eax,%es:(%edi)
>> Code; c0244df9 <__kfree_skb+159/170>
>> 5: f2 2e c0 8b 5c 24 14 repnz rorb $0xae,%cs:0xe914245c(%
>> ebx)
>> Code; c0244e00 <__kfree_skb+160/170>
>> c: e9 ae
>> Code; c0244e02 <__kfree_skb+162/170>
>> e: fe (bad)
>> Code; c0244e03 <__kfree_skb+163/170>
>> f: ff (bad)
>> Code; c0244e04 <__kfree_skb+164/170>
>> 10: ff 8d 74 26 00 00 decl 0x2674(%ebp)
>>
>> <0>Kernel panic: Aiee, killing interrupt handler!
>>
>>
>> Roman
>>
>>
>>> Eddie
>>>
>>>
>>> rchertov at purdue.edu wrote:
>>>
>>>> Quoting Eddie Kohler <kohler at cs.ucla.edu>:
>>>>
>>>>
>>>>
>>>>> Roman,
>>>>>
>>>>> A ksymoops would be extremely helpful!
>>>>>
>>>>
>>>>
>>>>
>>>> I noticed that sometimes there is no crash but nothing is
>>>> received on
>>>>
>> the
>>
>>> device
>>>
>>>> until the machine is restarted.
>>>>
>>>> ksymoops output
>>>>
>>>> Unable to handle kernel NULL pointer dereference at virtual address
>>>>
>>> 00000080
>>>
>>>> f8829281
>>>> *pde = 00000000
>>>> Oops: 0000
>>>> CPU: 0
>>>> EIP: 0010:[<f8829281>] Not tainted
>>>> Using defaults from ksymoops -t elf32-i386 -a i386
>>>> EFLAGS: 00010246
>>>> eax: 00000002 ebx: 00000047 ecx: 00000000 edx: 00000040
>>>> esi: f708d470 edi: 00000000 ebp: f6869e10 esp: f6869dd8
>>>> ds: 0018 es: 0018 ss: 0018
>>>> Process kclick (pid: 975, stackpage=f6869000)
>>>> Stack: c0244a14 00035846 00000282 00000000 c030d21c 00000001
>>>> 00000040
>>>>
>>> 00000000
>>>
>>>> f88ff58c f7731800 f7731aa8 00000000 f77319a0 00000040
>>>> f6869e40
>>>>
>>> f8828daf
>>>
>>>> f77319a0 f6869e30 00000040 00000287 00001b30 00000000
>>>> 00000001
>>>>
>>> f77318c4
>>>
>>>> Call Trace: [<c0244a14>] [<f8828daf>] [<c024a066>] [<c0125d19>]
>>>>
>>> [<c010af19>]
>>>
>>>> [<c010da78>] [<f882a3db>] [<f89b45c5>] [<f89b1d89>] [<f89dd862>]
>>>>
>>> [<f89760e0>]
>>>
>>>> [<f89d2670>] [<c010752e>] [<f89d25ec>]
>>>> Code: 8b 8f 80 00 00 00 83 ea 04 85 c9 8b 47 10 0f 85 82 02 00 00
>>>>
>>>>
>>>>
>>>>>> EIP; f8829281 <[e1000]e1000_clean_rx_irq+9d/340> <=====
>>>>>>
>>>>
>>>> Trace; c0244a14 <alloc_skb+c4/1d0>
>>>> Trace; f8828daf <[e1000]e1000_clean+43/e4>
>>>> Trace; c024a066 <net_rx_action+a6/160>
>>>> Trace; c0125d19 <do_softirq+d9/e0>
>>>> Trace; c010af19 <do_IRQ+f9/120>
>>>> Trace; c010da78 <call_do_IRQ+5/d>
>>>> Trace; f882a3db <[e1000]e1000_rx_poll+4b/2ec>
>>>> Trace; f89b45c5 <END_OF_CODE+60aae/????>
>>>> Trace; f89b1d89 <END_OF_CODE+5e272/????>
>>>> Trace; f89dd862 <END_OF_CODE+89d4b/????>
>>>> Trace; f89760e0 <END_OF_CODE+225c9/????>
>>>> Trace; f89d2670 <END_OF_CODE+7eb59/????>
>>>> Trace; c010752e <arch_kernel_thread+2e/40>
>>>> Trace; f89d25ec <END_OF_CODE+7ead5/????>
>>>> Code; f8829281 <[e1000]e1000_clean_rx_irq+9d/340>
>>>> 00000000 <_EIP>:
>>>> Code; f8829281 <[e1000]e1000_clean_rx_irq+9d/340> <=====
>>>> 0: 8b 8f 80 00 00 00 mov 0x80(%edi),%ecx <=====
>>>> Code; f8829287 <[e1000]e1000_clean_rx_irq+a3/340>
>>>> 6: 83 ea 04 sub $0x4,%edx
>>>> Code; f882928a <[e1000]e1000_clean_rx_irq+a6/340>
>>>> 9: 85 c9 test %ecx,%ecx
>>>> Code; f882928c <[e1000]e1000_clean_rx_irq+a8/340>
>>>> b: 8b 47 10 mov 0x10(%edi),%eax
>>>> Code; f882928f <[e1000]e1000_clean_rx_irq+ab/340>
>>>> e: 0f 85 82 02 00 00 jne 296 <_EIP+0x296>
>>>>
>>>> <0>Kernel panic: Aiee, killing interrupt handler!
>>>>
>>>> 1 warning issued. Results may not be reliable.
>>>>
>>>>
>>>> Roman
>>>>
>>>>
>>>>
>>>>>
>>>>> On Jul 6, 2005, at 5:29 PM, rchertov at purdue.edu wrote:
>>>>>
>>>>>
>>>>>
>>>>>> This is on the 2.4.26 SMP kernel using the latest CVS click SMP
>>>>>> build. The
>>>>>> driver is the e1000-5.x. I usually get this after I send a high
>>>>>>
>>>>
>>>>
>>>>
>>>>
>>>>>
>>>>> On Jul 6, 2005, at 5:29 PM, rchertov at purdue.edu wrote:
>>>>>
>>>>>
>>>>>
>>>>>> This is on the 2.4.26 SMP kernel using the latest CVS click SMP
>>>>>> build. The
>>>>>> driver is the e1000-5.x. I usually get this after I send a high
>>>>>> rate of UDP
>>>>>> packets with a 10 byte payload from my own packet generator (user
>>>>>> land). I am
>>>>>> going to try to use the 1.4.3 click as it seemed to crash less
>>>>>> frequently.
>>>>>>
>>>>>>
>>>>>> This is the script that I run on the node that crashes. The
>>>>>> machine runs on 2
>>>>>> Xenon 2.8 Ghz CPUs with hyperthreading (linux reports 4 cpus) and
>>>>>> the NICs are
>>>>>> pci-66 Intel Pro 1000
>>>>>>
>>>>>> PollDevice(eth1)
>>>>>> -> Queue(200)
>>>>>> -> ToDevice(eth2);
>>>>>>
>>>>>> PollDevice(eth2)
>>>>>> -> Queue(200)
>>>>>> -> ToDevice(eth1);
>>>>>>
>>>>>>
>>>>>>
>>>>>> Unable to handle kernel NULL pointer dereference at virtual
>>>>>> address
>>>>>> 00000080
>>>>>> printing eip:
>>>>>> f8829281
>>>>>> *pde = 00000000
>>>>>> Oops: 0000
>>>>>> CPU: 0
>>>>>> EIP: 0010:[<f8829281>] Not tainted
>>>>>> EFLAGS: 00010246
>>>>>> eax: 00000002 ebx: 00000047 ecx: 00000000 edx: 00000040
>>>>>> esi: f708d470 edi: 00000000 ebp: f6869e10 esp: f6869dd8
>>>>>> ds: 0018 es: 0018 ss: 0018
>>>>>> Process kclick (pid: 975, stackpage=f6869000)
>>>>>> Stack: c0244a14 00035846 00000282 00000000 c030d21c 00000001
>>>>>> 00000040 00000000
>>>>>> f88ff58c f7731800 f7731aa8 00000000 f77319a0 00000040
>>>>>> f6869e40 f8828daf
>>>>>> f77319a0 f6869e30 00000040 00000287 00001b30 00000000
>>>>>> 00000001 f77318c4
>>>>>> Call Trace: [<c0244a14>] [<f8828daf>] [<c024a066>]
>>>>>> [<c0125d19>]
>>>>>> [<c010af19>]
>>>>>> [<c010da78>] [<f882a3db>] [<f89b45c5>] [<f89b1d89>] [<f89dd862>]
>>>>>> [<f89760e0>]
>>>>>> [<f89d2670>] [<c010752e>] [<f89d25ec>]
>>>>>>
>>>>>> Code: 8b 8f 80 00 00 00 83 ea 04 85 c9 8b 47 10 0f 85 82 02 00 00
>>>>>> <0>Kernel panic: Aiee, killing interrupt handler!
>>>>>> In interrupt handler - not syncing
>>>>>>
>>>>>>
>>>>>> I don't suppose there is any quick magic fix laying about? :)
>>>>>>
>>>>>> Roman
>>>>>>
>>>>>> _______________________________________________
>>>>>> click mailing list
>>>>>> click at amsterdam.lcs.mit.edu
>>>>>> https://amsterdam.lcs.mit.edu/mailman/listinfo/click
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>>
>>>>
>>>
>>>
>>
>>
>>
>> _______________________________________________
>> click mailing list
>> click at amsterdam.lcs.mit.edu
>> https://amsterdam.lcs.mit.edu/mailman/listinfo/click
>>
>>
>
>
>
>
More information about the click
mailing list