[Click] driver crash
rchertov at purdue.edu
rchertov at purdue.edu
Thu Jul 28 18:16:55 EDT 2005
Quoting Eddie Kohler <kohler at CS.UCLA.EDU>:
> Hi Roman,
>
> ?? So, hmm. I don't think anything has changed in the e1000-5.x
> driver over the last two weeks. Are you saying that current CVS is
I am not sure how much of a difference that makes, but this is what diff
reported for the e1000_main.c
2657c2657
< netif_receive_skb(skb);
---
> netif_receive_skb(skb, skb->protocol, 0);
2660c2660
< netif_receive_skb(skb);
---
> netif_receive_skb(skb, skb->protocol, 0);
3747,3748c3747,3748
< save_flags(flags);
< cli();
---
> local_save_flags(flags);
> local_irq_disable();
3753c3753
< restore_flags(flags);
---
> local_irq_restore(flags);
> bad? That 1.4.3 is bad?
> Or that everything now appears to work?>
1.4.3 and the e1000-5.x from two weeks ago works great now. At least at 100Mbit
speeds with 140K pps UDP flows.
Roman
>
> Eddie
>
>
>
> On Jul 28, 2005, at 2:43 PM, rchertov at purdue.edu wrote:
>
> > Quoting rchertov at purdue.edu:
> >
> > When I reinstalled a fresh 2.4.26 kernel and 1.4.3 Click (with patched
> > recycle()) with a e1000-5.x driver from a CVS snapshot from two
> > weeks ago, the
> > driver crashes have stopped.
> >
> > Roman
> >
> >
> >
> >> Quoting Eddie Kohler <kohler at cs.ucla.edu>:
> >>
> >>
> >>> I bet this is the problem Qinghua Ye recently reported. When he
> >>> sends the
> >>> patch
> >>> to the list, please apply it & see if that helps!
> >>>
> >>
> >> I have applied the patch to the 1.4.3 version. And installed the
> >> latest
> >> driver
> >> from CVS. I get a variety of errors. Sometimes the network
> >> driver crashes
> >> and
> >> there is no network connectivity. Other times the whole system
> >> goes down.
> >>
> >> Warning: kfree_skb passed an skb still on a list (from f88291dc).
> >> kernel BUG at skbuff.c:316!
> >> invalid operand: 0000
> >> CPU: 1
> >> EIP: 0010:[<c0244df4>] Not tainted
> >> EFLAGS: 00010282
> >> eax: 00000045 ebx: f7049440 ecx: 00000092 edx: f7773f7c
> >> esi: 000000df edi: f7049440 ebp: f683de48 esp: f683ddec
> >> ds: 0018 es: 0018 ss: 0018
> >> Process kclick (pid: 974, stackpage=f683d000)
> >> Stack: c02f0540 f88291dc f705e6e0 368400da f88291dc f7049440 00000000
> >> 00000014
> >> f705e6e0 00000246 00000283 f67f1000 00000009 f700ace0 0000000e
> >> f7707250
> >> 000000df f88f1158 f7707000 f7707234 f77070c4 f77071a0 00000040
> >> f683de78
> >> Call Trace: [<f88291dc>] [<f88291dc>] [<f8828d9f>] [<c024a066>]
> >> [<c0125d19>]
> >> [<c010af19>] [<c010da78>] [<c010f7c7>] [<f89b2ab0>] [<f89de0a2>]
> >> [<f89746f4>]
> >> [<f89d2738>] [<c010752e>] [<f89d26b4>]
> >>
> >> Code: 0f 0b 3c 01 ab f2 2e c0 8b 5c 24 14 e9 ae fe ff ff 8d 74 26
> >> <0>Kernel panic: Aiee, killing interrupt handler!
> >> In interrupt handler - not syncing
> >> Warning: kfree_skb passed an skb still on a list (from f88291dc).
> >> kernel BUG at skbuff.c:316!
> >> invalid operand: 0000
> >> CPU: 1
> >> EIP: 0010:[<c0244df4>] Not tainted
> >> Using defaults from ksymoops -t elf32-i386 -a i386
> >> EFLAGS: 00010282
> >> eax: 00000045 ebx: f7049440 ecx: 00000092 edx: f7773f7c
> >> esi: 000000df edi: f7049440 ebp: f683de48 esp: f683ddec
> >> ds: 0018 es: 0018 ss: 0018
> >> Process kclick (pid: 974, stackpage=f683d000)
> >> Stack: c02f0540 f88291dc f705e6e0 368400da f88291dc f7049440 00000000
> >> 00000014
> >> f705e6e0 00000246 00000283 f67f1000 00000009 f700ace0 0000000e
> >> f7707250
> >> 000000df f88f1158 f7707000 f7707234 f77070c4 f77071a0 00000040
> >> f683de78
> >> Call Trace: [<f88291dc>] [<f88291dc>] [<f8828d9f>] [<c024a066>]
> >> [<c0125d19>]
> >> [<c010af19>] [<c010da78>] [<c010f7c7>] [<f89b2ab0>] [<f89de0a2>]
> >> [<f89746f4>]
> >> [<f89d2738>] [<c010752e>] [<f89d26b4>]
> >> Code: 0f 0b 3c 01 ab f2 2e c0 8b 5c 24 14 e9 ae fe ff ff 8d 74 26
> >>
> >>
> >>>> EIP; c0244df4 <__kfree_skb+154/170> <=====
> >>>>
> >> Trace; f88291dc <[e1000]e1000_clean_tx_irq+38c/394>
> >> Trace; f88291dc <[e1000]e1000_clean_tx_irq+38c/394>
> >> Trace; f8828d9f <[e1000]e1000_clean+33/e4>
> >> Trace; c024a066 <net_rx_action+a6/160>
> >> Trace; c0125d19 <do_softirq+d9/e0>
> >> Trace; c010af19 <do_IRQ+f9/120>
> >> Trace; c010da78 <call_do_IRQ+5/d>
> >> Trace; c010f7c7 <do_gettimeofday+57/80>
> >> Trace; f89b2ab0 <END_OF_CODE+5ef99/????>
> >> Trace; f89de0a2 <END_OF_CODE+8a58b/????>
> >> Trace; f89746f4 <END_OF_CODE+20bdd/????>
> >> Trace; f89d2738 <END_OF_CODE+7ec21/????>
> >> Trace; c010752e <arch_kernel_thread+2e/40>
> >> Trace; f89d26b4 <END_OF_CODE+7eb9d/????>
> >> Code; c0244df4 <__kfree_skb+154/170>
> >> 00000000 <_EIP>:
> >> Code; c0244df4 <__kfree_skb+154/170> <=====
> >> 0: 0f 0b ud2a <=====
> >> Code; c0244df6 <__kfree_skb+156/170>
> >> 2: 3c 01 cmp $0x1,%al
> >> Code; c0244df8 <__kfree_skb+158/170>
> >> 4: ab stos %eax,%es:(%edi)
> >> Code; c0244df9 <__kfree_skb+159/170>
> >> 5: f2 2e c0 8b 5c 24 14 repnz rorb $0xae,%cs:0xe914245c(%
> >> ebx)
> >> Code; c0244e00 <__kfree_skb+160/170>
> >> c: e9 ae
> >> Code; c0244e02 <__kfree_skb+162/170>
> >> e: fe (bad)
> >> Code; c0244e03 <__kfree_skb+163/170>
> >> f: ff (bad)
> >> Code; c0244e04 <__kfree_skb+164/170>
> >> 10: ff 8d 74 26 00 00 decl 0x2674(%ebp)
> >>
> >> <0>Kernel panic: Aiee, killing interrupt handler!
> >>
> >>
> >> Roman
> >>
> >>
> >>> Eddie
> >>>
> >>>
> >>> rchertov at purdue.edu wrote:
> >>>
> >>>> Quoting Eddie Kohler <kohler at cs.ucla.edu>:
> >>>>
> >>>>
> >>>>
> >>>>> Roman,
> >>>>>
> >>>>> A ksymoops would be extremely helpful!
> >>>>>
> >>>>
> >>>>
> >>>>
> >>>> I noticed that sometimes there is no crash but nothing is
> >>>> received on
> >>>>
> >> the
> >>
> >>> device
> >>>
> >>>> until the machine is restarted.
> >>>>
> >>>> ksymoops output
> >>>>
> >>>> Unable to handle kernel NULL pointer dereference at virtual address
> >>>>
> >>> 00000080
> >>>
> >>>> f8829281
> >>>> *pde = 00000000
> >>>> Oops: 0000
> >>>> CPU: 0
> >>>> EIP: 0010:[<f8829281>] Not tainted
> >>>> Using defaults from ksymoops -t elf32-i386 -a i386
> >>>> EFLAGS: 00010246
> >>>> eax: 00000002 ebx: 00000047 ecx: 00000000 edx: 00000040
> >>>> esi: f708d470 edi: 00000000 ebp: f6869e10 esp: f6869dd8
> >>>> ds: 0018 es: 0018 ss: 0018
> >>>> Process kclick (pid: 975, stackpage=f6869000)
> >>>> Stack: c0244a14 00035846 00000282 00000000 c030d21c 00000001
> >>>> 00000040
> >>>>
> >>> 00000000
> >>>
> >>>> f88ff58c f7731800 f7731aa8 00000000 f77319a0 00000040
> >>>> f6869e40
> >>>>
> >>> f8828daf
> >>>
> >>>> f77319a0 f6869e30 00000040 00000287 00001b30 00000000
> >>>> 00000001
> >>>>
> >>> f77318c4
> >>>
> >>>> Call Trace: [<c0244a14>] [<f8828daf>] [<c024a066>] [<c0125d19>]
> >>>>
> >>> [<c010af19>]
> >>>
> >>>> [<c010da78>] [<f882a3db>] [<f89b45c5>] [<f89b1d89>] [<f89dd862>]
> >>>>
> >>> [<f89760e0>]
> >>>
> >>>> [<f89d2670>] [<c010752e>] [<f89d25ec>]
> >>>> Code: 8b 8f 80 00 00 00 83 ea 04 85 c9 8b 47 10 0f 85 82 02 00 00
> >>>>
> >>>>
> >>>>
> >>>>>> EIP; f8829281 <[e1000]e1000_clean_rx_irq+9d/340> <=====
> >>>>>>
> >>>>
> >>>> Trace; c0244a14 <alloc_skb+c4/1d0>
> >>>> Trace; f8828daf <[e1000]e1000_clean+43/e4>
> >>>> Trace; c024a066 <net_rx_action+a6/160>
> >>>> Trace; c0125d19 <do_softirq+d9/e0>
> >>>> Trace; c010af19 <do_IRQ+f9/120>
> >>>> Trace; c010da78 <call_do_IRQ+5/d>
> >>>> Trace; f882a3db <[e1000]e1000_rx_poll+4b/2ec>
> >>>> Trace; f89b45c5 <END_OF_CODE+60aae/????>
> >>>> Trace; f89b1d89 <END_OF_CODE+5e272/????>
> >>>> Trace; f89dd862 <END_OF_CODE+89d4b/????>
> >>>> Trace; f89760e0 <END_OF_CODE+225c9/????>
> >>>> Trace; f89d2670 <END_OF_CODE+7eb59/????>
> >>>> Trace; c010752e <arch_kernel_thread+2e/40>
> >>>> Trace; f89d25ec <END_OF_CODE+7ead5/????>
> >>>> Code; f8829281 <[e1000]e1000_clean_rx_irq+9d/340>
> >>>> 00000000 <_EIP>:
> >>>> Code; f8829281 <[e1000]e1000_clean_rx_irq+9d/340> <=====
> >>>> 0: 8b 8f 80 00 00 00 mov 0x80(%edi),%ecx <=====
> >>>> Code; f8829287 <[e1000]e1000_clean_rx_irq+a3/340>
> >>>> 6: 83 ea 04 sub $0x4,%edx
> >>>> Code; f882928a <[e1000]e1000_clean_rx_irq+a6/340>
> >>>> 9: 85 c9 test %ecx,%ecx
> >>>> Code; f882928c <[e1000]e1000_clean_rx_irq+a8/340>
> >>>> b: 8b 47 10 mov 0x10(%edi),%eax
> >>>> Code; f882928f <[e1000]e1000_clean_rx_irq+ab/340>
> >>>> e: 0f 85 82 02 00 00 jne 296 <_EIP+0x296>
> >>>>
> >>>> <0>Kernel panic: Aiee, killing interrupt handler!
> >>>>
> >>>> 1 warning issued. Results may not be reliable.
> >>>>
> >>>>
> >>>> Roman
> >>>>
> >>>>
> >>>>
> >>>>>
> >>>>> On Jul 6, 2005, at 5:29 PM, rchertov at purdue.edu wrote:
> >>>>>
> >>>>>
> >>>>>
> >>>>>> This is on the 2.4.26 SMP kernel using the latest CVS click SMP
> >>>>>> build. The
> >>>>>> driver is the e1000-5.x. I usually get this after I send a high
> >>>>>>
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>>
> >>>>> On Jul 6, 2005, at 5:29 PM, rchertov at purdue.edu wrote:
> >>>>>
> >>>>>
> >>>>>
> >>>>>> This is on the 2.4.26 SMP kernel using the latest CVS click SMP
> >>>>>> build. The
> >>>>>> driver is the e1000-5.x. I usually get this after I send a high
> >>>>>> rate of UDP
> >>>>>> packets with a 10 byte payload from my own packet generator (user
> >>>>>> land). I am
> >>>>>> going to try to use the 1.4.3 click as it seemed to crash less
> >>>>>> frequently.
> >>>>>>
> >>>>>>
> >>>>>> This is the script that I run on the node that crashes. The
> >>>>>> machine runs on 2
> >>>>>> Xenon 2.8 Ghz CPUs with hyperthreading (linux reports 4 cpus) and
> >>>>>> the NICs are
> >>>>>> pci-66 Intel Pro 1000
> >>>>>>
> >>>>>> PollDevice(eth1)
> >>>>>> -> Queue(200)
> >>>>>> -> ToDevice(eth2);
> >>>>>>
> >>>>>> PollDevice(eth2)
> >>>>>> -> Queue(200)
> >>>>>> -> ToDevice(eth1);
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>> Unable to handle kernel NULL pointer dereference at virtual
> >>>>>> address
> >>>>>> 00000080
> >>>>>> printing eip:
> >>>>>> f8829281
> >>>>>> *pde = 00000000
> >>>>>> Oops: 0000
> >>>>>> CPU: 0
> >>>>>> EIP: 0010:[<f8829281>] Not tainted
> >>>>>> EFLAGS: 00010246
> >>>>>> eax: 00000002 ebx: 00000047 ecx: 00000000 edx: 00000040
> >>>>>> esi: f708d470 edi: 00000000 ebp: f6869e10 esp: f6869dd8
> >>>>>> ds: 0018 es: 0018 ss: 0018
> >>>>>> Process kclick (pid: 975, stackpage=f6869000)
> >>>>>> Stack: c0244a14 00035846 00000282 00000000 c030d21c 00000001
> >>>>>> 00000040 00000000
> >>>>>> f88ff58c f7731800 f7731aa8 00000000 f77319a0 00000040
> >>>>>> f6869e40 f8828daf
> >>>>>> f77319a0 f6869e30 00000040 00000287 00001b30 00000000
> >>>>>> 00000001 f77318c4
> >>>>>> Call Trace: [<c0244a14>] [<f8828daf>] [<c024a066>]
> >>>>>> [<c0125d19>]
> >>>>>> [<c010af19>]
> >>>>>> [<c010da78>] [<f882a3db>] [<f89b45c5>] [<f89b1d89>] [<f89dd862>]
> >>>>>> [<f89760e0>]
> >>>>>> [<f89d2670>] [<c010752e>] [<f89d25ec>]
> >>>>>>
> >>>>>> Code: 8b 8f 80 00 00 00 83 ea 04 85 c9 8b 47 10 0f 85 82 02 00 00
> >>>>>> <0>Kernel panic: Aiee, killing interrupt handler!
> >>>>>> In interrupt handler - not syncing
> >>>>>>
> >>>>>>
> >>>>>> I don't suppose there is any quick magic fix laying about? :)
> >>>>>>
> >>>>>> Roman
> >>>>>>
> >>>>>> _______________________________________________
> >>>>>> click mailing list
> >>>>>> click at amsterdam.lcs.mit.edu
> >>>>>> https://amsterdam.lcs.mit.edu/mailman/listinfo/click
> >>>>>>
> >>>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>
> >>>>
> >>>>
> >>>>
> >>>
> >>>
> >>
> >>
> >>
> >> _______________________________________________
> >> click mailing list
> >> click at amsterdam.lcs.mit.edu
> >> https://amsterdam.lcs.mit.edu/mailman/listinfo/click
> >>
> >>
> >
> >
> >
> >
>
>
More information about the click
mailing list