[Click] e1000 driver timeout with 2.6.x

todd lewis tgl2 at yahoo.com
Wed Jul 19 04:41:10 EDT 2006


Recompiling the kernel, modules, driver and click for UP instead of SMP did allow it to work, but
instead of my normal 700mbps (which I can sustain even with netfilter queueing to userspace), I
instead got 190kbps and these errors:

****************************
[17195356.700000] e1000: eth2: e1000_watchdog_1: NIC Link is Up 1000 Mbps Full Duplex
[17195379.828000] e1000: eth3: e1000_clean_tx_irq: Detected Tx Unit Hang
[17195379.828000]   TDH                  <92>
[17195379.828000]   TDT                  <92>
[17195379.828000]   next_to_use          <92>
[17195379.828000]   next_to_clean        <91>
[17195379.828000] buffer_info[next_to_clean]
[17195379.828000]   dma                  <7c3c7840>
[17195379.828000]   time_stamp           <0>
[17195379.828000]   next_to_watch        <0>
[17195379.828000]   jiffies              <3b2b1d>
[17195379.828000]   next_to_watch.status <0>
[17195430.836000] e1000: eth3: e1000_clean_tx_irq: Detected Tx Unit Hang
[17195430.836000]   TDH                  <14>
[17195430.836000]   TDT                  <14>
[17195430.836000]   next_to_use          <14>
[17195430.836000]   next_to_clean        <d6>
[17195430.836000] buffer_info[next_to_clean]
[17195430.836000]   dma                  <7c230c40>
[17195430.836000]   time_stamp           <0>
[17195430.836000]   next_to_watch        <0>
[17195430.836000]   jiffies              <3b5ced>
[17195430.836000]   next_to_watch.status <0>
(..., lots of these)
****************************

I have a dual-port pcie e1000 card.  I plan to try that, and then to try one port from each card. 
If anyone has any other experiments they'd like run with my setup, then please let me know.

--- Adam Greenhalgh <a.greenhalgh at cs.ucl.ac.uk> wrote:

> Max,
> 
> Are you running an SMP kernel with the polling boxes and which intel
> card are you using ? I've seen numerous SMP related reports on the
> e1000 / netdev lists and just noticed that todd is using an SMP
> kernel.
> 
> Adam
> 
> On 7/6/06, Massimiliano Poletto <maxp at mazunetworks.com> wrote:
> > Hi Srivas and Beyers, I've spent some time looking at drivers again recently.
> >
> > What works best for me at present is a patched version of the 6.1.16.2
> > Intel driver (not the 6.3.9-k4 driver that comes with linux
> > 2.6.16.13).  I attach the driver sources and patch to this email.  I'm
> > using a 2.6.16.22 kernel, but I don't see why .13 should work any less
> > well with the driver.
> >
> > Performance is good, and it is stable across hundreds of
> > installs/uninstalls and many hours of testing at full line rate
> > offered load.  I sometimes see messages similar to yours (below is an
> > example), but they only seem to happen during stress tests when click
> > is repeatedly installed/uninstalled at very short intervals:
> > e1000: eth2: e1000_clean_tx_irq: Detected Tx Unit Hang
> >   TDH                  <b0>
> >   TDT                  <b0>
> >   next_to_use          <b0>
> >   next_to_clean        <9e>
> > buffer_info[next_to_clean]
> >   dma                  <2c66c040>
> >   time_stamp           <0>
> >   next_to_watch        <0>
> >   jiffies              <82ddae8>
> >   next_to_watch.status <0>
> >
> > I'm trying to get a sourceforge 7.x driver to work, but for now this
> > seems at least workable.
> >
> > Please let me know if you have problems with this driver, or if you
> > make other progress yourselves.
> >
> > Regards,
> > max
> >
> >
> > On 7/3/06, Srivas Chennu <chennu at hhi.fhg.de> wrote:
> > > Hello Beyers,
> > >
> > > Thanks a lot for your speedy response. To answer your question regarding
> > > the e1000 driver, I've downloaded and tested my configuration with the
> > > latest stable release (7.1.9) from sourceforge, and the timeout
> > > stubbornly continues to occur with the TSO option disabled.
> > >
> > > For your reference, the possibly relevant snippet of my click
> > > configuration is attached. It uses a click element (onuagent) that I've
> > > written to emulate the protocol being tested, which receives and
> > > forwards packets between 3 interfaces via a customized priority
> > > schedulers.
> > >
> > > ...
> > > FromDevice($rp0, PROMISC true) -> [0]onuagent;
> > > onuagent[0] -> priosched0 -> ToDevice($rp0);
> > > FromDevice($rp1, PROMISC true) -> [1]onuagent;
> > > onuagent[1] -> priosched1 -> ToDevice($rp1);
> > > FromDevice($lp, PROMISC true) -> [2]onuagent;
> > > onuagent[2] -> priosched2 -> ToDevice($lp);
> > > ...
> > >
> > > I'm currently attempting to find a combination of a kernel (2.4.x or
> > > 2.6.x) and a stable e1000 driver version with which I can reliably use
> > > FromDevice/PollDevice. Any details of a setup that has worked for you in
> > > this regard would be helpful.
> > >
> > > Thanks in advance,
> > > Srivas.
> > >
> > > On Jul 03, 2006 05:18 PM, Beyers Cronje wrote:
> > >
> > >
> > > >
> > > > Hi Srivas,
> > > >
> > > > This is a problem myself, Adamand a few others have been struggling
> > > > with. Strange FromDevice gives you the TX hang, as on my system it
> > > > only happens when using PollDevice in certain configurations. If
> > > > possible can you post the Click config you are using to duplicate the
> > > > hang?
> > > >
> > > > Adam pointed me to the E1000 dev mailing list on SourceForge and the
> > > > TX Hang issue seems to pop up on standard linux (non-click) systems as
> > > > well. One possible workaround seems to be to disable tcp segmentation
> > > > offloading (TSO), you can do this via 'ethtool -K eth0 tso off', but
> > > > seems to work only sometimes ...
> > > >
> > > > What e1000 driver version are you using? Since you only using
> > > > FromDevice have you tried the latest e1000 driver?
> > > >
> > > > Anyone else also having this problem?
> > > >
> > > > Beyers
> > > >
> > > >
> > > > On 7/3/06, Srivas Chennu <chennu at hhi.fhg.de> wrote:
> > > > > Hello all,
> > > > >
> > > > > I'm a relatively new click user trying to build and test a link
> > > > > layer
> > > > > protocol using Click. My test runs used the click kernel module
> > > > > built
> > > > > from the latest CVS sources. On a patched 2.6.16.13 kernel with an
> > > > > original Intel PRO/1000 MT dual port GbE NIC for a click
> > > > > configuration
> > > > > using FromDevice, the driver abruptly times out during Tx and resets
> > > > > with messages like those below:
> > > > >
> > > > > e1000: eth1: e1000_clean_tx_irq: Detected Tx Unit Hang
> > > > > Tx Queue <0>
> > > > > TDH <97>
> > > > > TDT <9a>
> > > > > next_to_use <9a>
> > > > > next_to_clean <95>
> > > > > buffer_info[next_to_clean]
> > > > > time_stamp
> > > > > next_to_watch <97>
> > > > > jiffies
> > > > > next_to_watch.status <0>
> > > > > ....
> > > > > ....
> > > > > Eventually I see in the log file:
> > > > >
> > > > > NETDEV WATCHDOG: eth1: transmit timed out
> > > > > e1000: eth1: e1000_watchdog: NIC Link is Up 1000 Mbps Full Duplex
> > > > >
> > > > > Interestingly, this timeout-and-reset problem does not occur when
> > > > > running my click configuration at the userlevel, but reproduces
> > > > > quite
> > > > > easily with the kernel module, even when the NIC is working at low
> > > > > packet Rx rates. All configuration parameters to the e1000 modules
> > > > > are
> > > > > at their defaults, and my attempts with parameters suggested in a
> > > > > previous post
> > > > > (
> > > > > https://amsterdam.lcs.mit.edu/pipermail/click/2006-March/004690.html)
> > > > > for similar problems didn't help.
> > > > >
> > > > >
> > > > > Any pointers to solving this problem are appreciated,
> > > > >
> > > > > Thanks in advance,
> > > > > Srivas.
> > > > >
> > > > >
> > > > > --
> > > > > Meet us at
> > > > >
> > > > > OC&I 2006 and NOC 2006: 10.-13.7.06 at HHI Berlin, Germany
> > > > > IFA: 1.-6.9.06, Berlin, Germany
> > > > > _______________________________________________
> > > > > click mailing list
> > > > > click at amsterdam.lcs.mit.edu
> > > > > https://amsterdam.lcs.mit.edu/mailman/listinfo/click
> > > > >
> > >
> > > >
> > >
> > >
> > > --
> > > Meet us at
> > >
> > > OC&I 2006 and NOC 2006: 10.-13.7.06 at HHI Berlin, Germany
> > > IFA: 1.-6.9.06, Berlin, Germany
> > > _______________________________________________
> > > click mailing list
> > > click at amsterdam.lcs.mit.edu
> > > https://amsterdam.lcs.mit.edu/mailman/listinfo/click
> > >
> >
> >
> > _______________________________________________
> > click mailing list
> > click at amsterdam.lcs.mit.edu
> > https://amsterdam.lcs.mit.edu/mailman/listinfo/click
> >
> >
> >
> >
> _______________________________________________
> click mailing list
> click at amsterdam.lcs.mit.edu
> https://amsterdam.lcs.mit.edu/mailman/listinfo/click
> 


__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 


More information about the click mailing list