[Click] driver crash

rchertov at purdue.edu rchertov at purdue.edu
Wed Jul 27 17:39:14 EDT 2005


Quoting Eddie Kohler <kohler at cs.ucla.edu>:

> I bet this is the problem Qinghua Ye recently reported.  When he sends the
> patch 
> to the list, please apply it & see if that helps!

I have applied the patch to the 1.4.3 version.  And installed the latest driver
from CVS.  I get a variety of errors.  Sometimes the network driver crashes and
there is no network connectivity.  Other times the whole system goes down.

Warning: kfree_skb passed an skb still on a list (from f88291dc).
kernel BUG at skbuff.c:316!
invalid operand: 0000
CPU:    1
EIP:    0010:[<c0244df4>]    Not tainted
EFLAGS: 00010282
eax: 00000045   ebx: f7049440   ecx: 00000092   edx: f7773f7c
esi: 000000df   edi: f7049440   ebp: f683de48   esp: f683ddec
ds: 0018   es: 0018   ss: 0018
Process kclick (pid: 974, stackpage=f683d000)
Stack: c02f0540 f88291dc f705e6e0 368400da f88291dc f7049440 00000000 00000014
       f705e6e0 00000246 00000283 f67f1000 00000009 f700ace0 0000000e f7707250
       000000df f88f1158 f7707000 f7707234 f77070c4 f77071a0 00000040 f683de78
Call Trace:    [<f88291dc>] [<f88291dc>] [<f8828d9f>] [<c024a066>] [<c0125d19>]
  [<c010af19>] [<c010da78>] [<c010f7c7>] [<f89b2ab0>] [<f89de0a2>] [<f89746f4>]
  [<f89d2738>] [<c010752e>] [<f89d26b4>]

Code: 0f 0b 3c 01 ab f2 2e c0 8b 5c 24 14 e9 ae fe ff ff 8d 74 26
 <0>Kernel panic: Aiee, killing interrupt handler!
In interrupt handler - not syncing
Warning: kfree_skb passed an skb still on a list (from f88291dc).
kernel BUG at skbuff.c:316!
invalid operand: 0000
CPU:    1
EIP:    0010:[<c0244df4>]    Not tainted
Using defaults from ksymoops -t elf32-i386 -a i386
EFLAGS: 00010282
eax: 00000045   ebx: f7049440   ecx: 00000092   edx: f7773f7c
esi: 000000df   edi: f7049440   ebp: f683de48   esp: f683ddec
ds: 0018   es: 0018   ss: 0018
Process kclick (pid: 974, stackpage=f683d000)
Stack: c02f0540 f88291dc f705e6e0 368400da f88291dc f7049440 00000000 00000014
       f705e6e0 00000246 00000283 f67f1000 00000009 f700ace0 0000000e f7707250
       000000df f88f1158 f7707000 f7707234 f77070c4 f77071a0 00000040 f683de78
Call Trace:    [<f88291dc>] [<f88291dc>] [<f8828d9f>] [<c024a066>] [<c0125d19>]
  [<c010af19>] [<c010da78>] [<c010f7c7>] [<f89b2ab0>] [<f89de0a2>] [<f89746f4>]
  [<f89d2738>] [<c010752e>] [<f89d26b4>]
Code: 0f 0b 3c 01 ab f2 2e c0 8b 5c 24 14 e9 ae fe ff ff 8d 74 26

>>EIP; c0244df4 <__kfree_skb+154/170>   <=====
Trace; f88291dc <[e1000]e1000_clean_tx_irq+38c/394>
Trace; f88291dc <[e1000]e1000_clean_tx_irq+38c/394>
Trace; f8828d9f <[e1000]e1000_clean+33/e4>
Trace; c024a066 <net_rx_action+a6/160>
Trace; c0125d19 <do_softirq+d9/e0>
Trace; c010af19 <do_IRQ+f9/120>
Trace; c010da78 <call_do_IRQ+5/d>
Trace; c010f7c7 <do_gettimeofday+57/80>
Trace; f89b2ab0 <END_OF_CODE+5ef99/????>
Trace; f89de0a2 <END_OF_CODE+8a58b/????>
Trace; f89746f4 <END_OF_CODE+20bdd/????>
Trace; f89d2738 <END_OF_CODE+7ec21/????>
Trace; c010752e <arch_kernel_thread+2e/40>
Trace; f89d26b4 <END_OF_CODE+7eb9d/????>
Code;  c0244df4 <__kfree_skb+154/170>
00000000 <_EIP>:
Code;  c0244df4 <__kfree_skb+154/170>   <=====
   0:   0f 0b                     ud2a      <=====
Code;  c0244df6 <__kfree_skb+156/170>
   2:   3c 01                     cmp    $0x1,%al
Code;  c0244df8 <__kfree_skb+158/170>
   4:   ab                        stos   %eax,%es:(%edi)
Code;  c0244df9 <__kfree_skb+159/170>
   5:   f2 2e c0 8b 5c 24 14      repnz rorb $0xae,%cs:0xe914245c(%ebx)
Code;  c0244e00 <__kfree_skb+160/170>
   c:   e9 ae
Code;  c0244e02 <__kfree_skb+162/170>
   e:   fe                        (bad)
Code;  c0244e03 <__kfree_skb+163/170>
   f:   ff                        (bad)
Code;  c0244e04 <__kfree_skb+164/170>
  10:   ff 8d 74 26 00 00         decl   0x2674(%ebp)

 <0>Kernel panic: Aiee, killing interrupt handler!


Roman

> Eddie
> 
> 
> rchertov at purdue.edu wrote:
> > Quoting Eddie Kohler <kohler at cs.ucla.edu>:
> > 
> > 
> >>Roman,
> >>
> >>A ksymoops would be extremely helpful!
> > 
> > 
> > 
> > I noticed that sometimes there is no crash but nothing is received on the
> device
> > until the machine is restarted.  
> > 
> > ksymoops output
> > 
> > Unable to handle kernel NULL pointer dereference at virtual address
> 00000080
> > f8829281
> > *pde = 00000000
> > Oops: 0000
> > CPU:    0
> > EIP:    0010:[<f8829281>]    Not tainted
> > Using defaults from ksymoops -t elf32-i386 -a i386
> > EFLAGS: 00010246
> > eax: 00000002   ebx: 00000047   ecx: 00000000   edx: 00000040
> > esi: f708d470   edi: 00000000   ebp: f6869e10   esp: f6869dd8
> > ds: 0018   es: 0018   ss: 0018
> > Process kclick (pid: 975, stackpage=f6869000)
> > Stack: c0244a14 00035846 00000282 00000000 c030d21c 00000001 00000040
> 00000000
> >        f88ff58c f7731800 f7731aa8 00000000 f77319a0 00000040 f6869e40
> f8828daf
> >        f77319a0 f6869e30 00000040 00000287 00001b30 00000000 00000001
> f77318c4
> > Call Trace:    [<c0244a14>] [<f8828daf>] [<c024a066>] [<c0125d19>]
> [<c010af19>]
> >   [<c010da78>] [<f882a3db>] [<f89b45c5>] [<f89b1d89>] [<f89dd862>]
> [<f89760e0>]
> >   [<f89d2670>] [<c010752e>] [<f89d25ec>]
> > Code: 8b 8f 80 00 00 00 83 ea 04 85 c9 8b 47 10 0f 85 82 02 00 00
> >  
> > 
> >>>EIP; f8829281 <[e1000]e1000_clean_rx_irq+9d/340>   <=====
> > 
> > Trace; c0244a14 <alloc_skb+c4/1d0>
> > Trace; f8828daf <[e1000]e1000_clean+43/e4>
> > Trace; c024a066 <net_rx_action+a6/160>
> > Trace; c0125d19 <do_softirq+d9/e0>
> > Trace; c010af19 <do_IRQ+f9/120>
> > Trace; c010da78 <call_do_IRQ+5/d>
> > Trace; f882a3db <[e1000]e1000_rx_poll+4b/2ec>
> > Trace; f89b45c5 <END_OF_CODE+60aae/????>
> > Trace; f89b1d89 <END_OF_CODE+5e272/????>
> > Trace; f89dd862 <END_OF_CODE+89d4b/????>
> > Trace; f89760e0 <END_OF_CODE+225c9/????>
> > Trace; f89d2670 <END_OF_CODE+7eb59/????>
> > Trace; c010752e <arch_kernel_thread+2e/40>
> > Trace; f89d25ec <END_OF_CODE+7ead5/????>
> > Code;  f8829281 <[e1000]e1000_clean_rx_irq+9d/340>
> > 00000000 <_EIP>:
> > Code;  f8829281 <[e1000]e1000_clean_rx_irq+9d/340>   <=====
> >    0:   8b 8f 80 00 00 00         mov    0x80(%edi),%ecx   <=====
> > Code;  f8829287 <[e1000]e1000_clean_rx_irq+a3/340>
> >    6:   83 ea 04                  sub    $0x4,%edx
> > Code;  f882928a <[e1000]e1000_clean_rx_irq+a6/340>
> >    9:   85 c9                     test   %ecx,%ecx
> > Code;  f882928c <[e1000]e1000_clean_rx_irq+a8/340>
> >    b:   8b 47 10                  mov    0x10(%edi),%eax
> > Code;  f882928f <[e1000]e1000_clean_rx_irq+ab/340>
> >    e:   0f 85 82 02 00 00         jne    296 <_EIP+0x296>
> >  
> >  <0>Kernel panic: Aiee, killing interrupt handler!
> >  
> > 1 warning issued.  Results may not be reliable.
> > 
> > 
> > Roman
> > 
> > 
> >>
> >>On Jul 6, 2005, at 5:29 PM, rchertov at purdue.edu wrote:
> >>
> >>
> >>>This is on the 2.4.26 SMP kernel using the latest CVS click SMP  
> >>>build.  The
> >>>driver is the e1000-5.x.  I usually get this after I send a high  
> > 
> > 
> > 
> >>
> >>On Jul 6, 2005, at 5:29 PM, rchertov at purdue.edu wrote:
> >>
> >>
> >>>This is on the 2.4.26 SMP kernel using the latest CVS click SMP  
> >>>build.  The
> >>>driver is the e1000-5.x.  I usually get this after I send a high  
> >>>rate of UDP
> >>>packets with a 10 byte payload from my own packet generator (user  
> >>>land).  I am
> >>>going to try to use the 1.4.3 click as it seemed to crash less  
> >>>frequently.
> >>>
> >>>
> >>>This is the script that I run on the node that crashes.  The  
> >>>machine runs on 2
> >>>Xenon 2.8 Ghz CPUs with hyperthreading (linux reports 4 cpus) and  
> >>>the NICs are
> >>>pci-66 Intel Pro 1000
> >>>
> >>>PollDevice(eth1)
> >>>        -> Queue(200)
> >>>        -> ToDevice(eth2);
> >>>
> >>>PollDevice(eth2)
> >>>        -> Queue(200)
> >>>        -> ToDevice(eth1);
> >>>
> >>>
> >>>
> >>>Unable to handle kernel NULL pointer dereference at virtual address  
> >>>00000080
> >>> printing eip:
> >>>f8829281
> >>>*pde = 00000000
> >>>Oops: 0000
> >>>CPU:    0
> >>>EIP:    0010:[<f8829281>]    Not tainted
> >>>EFLAGS: 00010246
> >>>eax: 00000002   ebx: 00000047   ecx: 00000000   edx: 00000040
> >>>esi: f708d470   edi: 00000000   ebp: f6869e10   esp: f6869dd8
> >>>ds: 0018   es: 0018   ss: 0018
> >>>Process kclick (pid: 975, stackpage=f6869000)
> >>>Stack: c0244a14 00035846 00000282 00000000 c030d21c 00000001  
> >>>00000040 00000000
> >>>       f88ff58c f7731800 f7731aa8 00000000 f77319a0 00000040  
> >>>f6869e40 f8828daf
> >>>       f77319a0 f6869e30 00000040 00000287 00001b30 00000000  
> >>>00000001 f77318c4
> >>>Call Trace:    [<c0244a14>] [<f8828daf>] [<c024a066>] [<c0125d19>]  
> >>>[<c010af19>]
> >>>  [<c010da78>] [<f882a3db>] [<f89b45c5>] [<f89b1d89>] [<f89dd862>]  
> >>>[<f89760e0>]
> >>>  [<f89d2670>] [<c010752e>] [<f89d25ec>]
> >>>
> >>>Code: 8b 8f 80 00 00 00 83 ea 04 85 c9 8b 47 10 0f 85 82 02 00 00
> >>> <0>Kernel panic: Aiee, killing interrupt handler!
> >>>In interrupt handler - not syncing
> >>>
> >>>
> >>>I don't suppose there is any quick magic fix laying about? :)
> >>>
> >>>Roman
> >>>
> >>>_______________________________________________
> >>>click mailing list
> >>>click at amsterdam.lcs.mit.edu
> >>>https://amsterdam.lcs.mit.edu/mailman/listinfo/click
> >>>
> >>
> >>
> > 
> > 
> > 
> 





More information about the click mailing list