[Click] e1000 driver timeout with 2.6.x

Massimiliano Poletto maxp at mazunetworks.com
Wed Jul 5 22:06:28 EDT 2006


Hi Srivas and Beyers, I've spent some time looking at drivers again recently.

What works best for me at present is a patched version of the 6.1.16.2
Intel driver (not the 6.3.9-k4 driver that comes with linux
2.6.16.13).  I attach the driver sources and patch to this email.  I'm
using a 2.6.16.22 kernel, but I don't see why .13 should work any less
well with the driver.

Performance is good, and it is stable across hundreds of
installs/uninstalls and many hours of testing at full line rate
offered load.  I sometimes see messages similar to yours (below is an
example), but they only seem to happen during stress tests when click
is repeatedly installed/uninstalled at very short intervals:
e1000: eth2: e1000_clean_tx_irq: Detected Tx Unit Hang
  TDH                  <b0>
  TDT                  <b0>
  next_to_use          <b0>
  next_to_clean        <9e>
buffer_info[next_to_clean]
  dma                  <2c66c040>
  time_stamp           <0>
  next_to_watch        <0>
  jiffies              <82ddae8>
  next_to_watch.status <0>

I'm trying to get a sourceforge 7.x driver to work, but for now this
seems at least workable.

Please let me know if you have problems with this driver, or if you
make other progress yourselves.

Regards,
max


On 7/3/06, Srivas Chennu <chennu at hhi.fhg.de> wrote:
> Hello Beyers,
>
> Thanks a lot for your speedy response. To answer your question regarding
> the e1000 driver, I've downloaded and tested my configuration with the
> latest stable release (7.1.9) from sourceforge, and the timeout
> stubbornly continues to occur with the TSO option disabled.
>
> For your reference, the possibly relevant snippet of my click
> configuration is attached. It uses a click element (onuagent) that I've
> written to emulate the protocol being tested, which receives and
> forwards packets between 3 interfaces via a customized priority
> schedulers.
>
> ...
> FromDevice($rp0, PROMISC true) -> [0]onuagent;
> onuagent[0] -> priosched0 -> ToDevice($rp0);
> FromDevice($rp1, PROMISC true) -> [1]onuagent;
> onuagent[1] -> priosched1 -> ToDevice($rp1);
> FromDevice($lp, PROMISC true) -> [2]onuagent;
> onuagent[2] -> priosched2 -> ToDevice($lp);
> ...
>
> I'm currently attempting to find a combination of a kernel (2.4.x or
> 2.6.x) and a stable e1000 driver version with which I can reliably use
> FromDevice/PollDevice. Any details of a setup that has worked for you in
> this regard would be helpful.
>
> Thanks in advance,
> Srivas.
>
> On Jul 03, 2006 05:18 PM, Beyers Cronje wrote:
>
>
> >
> > Hi Srivas,
> >
> > This is a problem myself, Adamand a few others have been struggling
> > with. Strange FromDevice gives you the TX hang, as on my system it
> > only happens when using PollDevice in certain configurations. If
> > possible can you post the Click config you are using to duplicate the
> > hang?
> >
> > Adam pointed me to the E1000 dev mailing list on SourceForge and the
> > TX Hang issue seems to pop up on standard linux (non-click) systems as
> > well. One possible workaround seems to be to disable tcp segmentation
> > offloading (TSO), you can do this via 'ethtool -K eth0 tso off', but
> > seems to work only sometimes ...
> >
> > What e1000 driver version are you using? Since you only using
> > FromDevice have you tried the latest e1000 driver?
> >
> > Anyone else also having this problem?
> >
> > Beyers
> >
> >
> > On 7/3/06, Srivas Chennu <chennu at hhi.fhg.de> wrote:
> > > Hello all,
> > >
> > > I'm a relatively new click user trying to build and test a link
> > > layer
> > > protocol using Click. My test runs used the click kernel module
> > > built
> > > from the latest CVS sources. On a patched 2.6.16.13 kernel with an
> > > original Intel PRO/1000 MT dual port GbE NIC for a click
> > > configuration
> > > using FromDevice, the driver abruptly times out during Tx and resets
> > > with messages like those below:
> > >
> > > e1000: eth1: e1000_clean_tx_irq: Detected Tx Unit Hang
> > > Tx Queue <0>
> > > TDH <97>
> > > TDT <9a>
> > > next_to_use <9a>
> > > next_to_clean <95>
> > > buffer_info[next_to_clean]
> > > time_stamp
> > > next_to_watch <97>
> > > jiffies
> > > next_to_watch.status <0>
> > > ....
> > > ....
> > > Eventually I see in the log file:
> > >
> > > NETDEV WATCHDOG: eth1: transmit timed out
> > > e1000: eth1: e1000_watchdog: NIC Link is Up 1000 Mbps Full Duplex
> > >
> > > Interestingly, this timeout-and-reset problem does not occur when
> > > running my click configuration at the userlevel, but reproduces
> > > quite
> > > easily with the kernel module, even when the NIC is working at low
> > > packet Rx rates. All configuration parameters to the e1000 modules
> > > are
> > > at their defaults, and my attempts with parameters suggested in a
> > > previous post
> > > (
> > > https://amsterdam.lcs.mit.edu/pipermail/click/2006-March/004690.html)
> > > for similar problems didn't help.
> > >
> > >
> > > Any pointers to solving this problem are appreciated,
> > >
> > > Thanks in advance,
> > > Srivas.
> > >
> > >
> > > --
> > > Meet us at
> > >
> > > OC&I 2006 and NOC 2006: 10.-13.7.06 at HHI Berlin, Germany
> > > IFA: 1.-6.9.06, Berlin, Germany
> > > _______________________________________________
> > > click mailing list
> > > click at amsterdam.lcs.mit.edu
> > > https://amsterdam.lcs.mit.edu/mailman/listinfo/click
> > >
>
> >
>
>
> --
> Meet us at
>
> OC&I 2006 and NOC 2006: 10.-13.7.06 at HHI Berlin, Germany
> IFA: 1.-6.9.06, Berlin, Germany
> _______________________________________________
> click mailing list
> click at amsterdam.lcs.mit.edu
> https://amsterdam.lcs.mit.edu/mailman/listinfo/click
>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: e1000-6.1.16.2.DB.tar.gz
Type: application/x-gzip
Size: 161908 bytes
Desc: not available
Url : https://amsterdam.lcs.mit.edu/pipermail/click/attachments/20060705/7269ad1b/e1000-6.1.16.2.DB.tar-0001.bin
-------------- next part --------------
A non-text attachment was scrubbed...
Name: e1000-6.1.16.2.DB-polling.patch
Type: text/x-patch
Size: 20786 bytes
Desc: not available
Url : https://amsterdam.lcs.mit.edu/pipermail/click/attachments/20060705/7269ad1b/e1000-6.1.16.2.DB-polling-0001.bin


More information about the click mailing list