[Click] e1000 driver timeout with 2.6.x

Beyers Cronje bcronje at gmail.com
Tue Jul 11 11:00:04 EDT 2006


Hi Max!

Thank you very much, I should have time later this week to test and will let
you know how it runs.

Cheers

Beyers Cronje

On 7/6/06, Massimiliano Poletto <maxp at mazunetworks.com> wrote:
>
> Hi Srivas and Beyers, I've spent some time looking at drivers again
> recently.
>
> What works best for me at present is a patched version of the 6.1.16.2
> Intel driver (not the 6.3.9-k4 driver that comes with linux
> 2.6.16.13).  I attach the driver sources and patch to this email.  I'm
> using a 2.6.16.22 kernel, but I don't see why .13 should work any less
> well with the driver.
>
> Performance is good, and it is stable across hundreds of
> installs/uninstalls and many hours of testing at full line rate
> offered load.  I sometimes see messages similar to yours (below is an
> example), but they only seem to happen during stress tests when click
> is repeatedly installed/uninstalled at very short intervals:
> e1000: eth2: e1000_clean_tx_irq: Detected Tx Unit Hang
>   TDH                  <b0>
>   TDT                  <b0>
>   next_to_use          <b0>
>   next_to_clean        <9e>
> buffer_info[next_to_clean]
>   dma                  <2c66c040>
>   time_stamp           <0>
>   next_to_watch        <0>
>   jiffies              <82ddae8>
>   next_to_watch.status <0>
>
> I'm trying to get a sourceforge 7.x driver to work, but for now this
> seems at least workable.
>
> Please let me know if you have problems with this driver, or if you
> make other progress yourselves.
>
> Regards,
> max
>
>
> On 7/3/06, Srivas Chennu <chennu at hhi.fhg.de> wrote:
> > Hello Beyers,
> >
> > Thanks a lot for your speedy response. To answer your question regarding
> > the e1000 driver, I've downloaded and tested my configuration with the
> > latest stable release (7.1.9) from sourceforge, and the timeout
> > stubbornly continues to occur with the TSO option disabled.
> >
> > For your reference, the possibly relevant snippet of my click
> > configuration is attached. It uses a click element (onuagent) that I've
> > written to emulate the protocol being tested, which receives and
> > forwards packets between 3 interfaces via a customized priority
> > schedulers.
> >
> > ...
> > FromDevice($rp0, PROMISC true) -> [0]onuagent;
> > onuagent[0] -> priosched0 -> ToDevice($rp0);
> > FromDevice($rp1, PROMISC true) -> [1]onuagent;
> > onuagent[1] -> priosched1 -> ToDevice($rp1);
> > FromDevice($lp, PROMISC true) -> [2]onuagent;
> > onuagent[2] -> priosched2 -> ToDevice($lp);
> > ...
> >
> > I'm currently attempting to find a combination of a kernel (2.4.x or
> > 2.6.x) and a stable e1000 driver version with which I can reliably use
> > FromDevice/PollDevice. Any details of a setup that has worked for you in
> > this regard would be helpful.
> >
> > Thanks in advance,
> > Srivas.
> >
> > On Jul 03, 2006 05:18 PM, Beyers Cronje wrote:
> >
> >
> > >
> > > Hi Srivas,
> > >
> > > This is a problem myself, Adamand a few others have been struggling
> > > with. Strange FromDevice gives you the TX hang, as on my system it
> > > only happens when using PollDevice in certain configurations. If
> > > possible can you post the Click config you are using to duplicate the
> > > hang?
> > >
> > > Adam pointed me to the E1000 dev mailing list on SourceForge and the
> > > TX Hang issue seems to pop up on standard linux (non-click) systems as
> > > well. One possible workaround seems to be to disable tcp segmentation
> > > offloading (TSO), you can do this via 'ethtool -K eth0 tso off', but
> > > seems to work only sometimes ...
> > >
> > > What e1000 driver version are you using? Since you only using
> > > FromDevice have you tried the latest e1000 driver?
> > >
> > > Anyone else also having this problem?
> > >
> > > Beyers
> > >
> > >
> > > On 7/3/06, Srivas Chennu <chennu at hhi.fhg.de> wrote:
> > > > Hello all,
> > > >
> > > > I'm a relatively new click user trying to build and test a link
> > > > layer
> > > > protocol using Click. My test runs used the click kernel module
> > > > built
> > > > from the latest CVS sources. On a patched 2.6.16.13 kernel with an
> > > > original Intel PRO/1000 MT dual port GbE NIC for a click
> > > > configuration
> > > > using FromDevice, the driver abruptly times out during Tx and resets
> > > > with messages like those below:
> > > >
> > > > e1000: eth1: e1000_clean_tx_irq: Detected Tx Unit Hang
> > > > Tx Queue <0>
> > > > TDH <97>
> > > > TDT <9a>
> > > > next_to_use <9a>
> > > > next_to_clean <95>
> > > > buffer_info[next_to_clean]
> > > > time_stamp
> > > > next_to_watch <97>
> > > > jiffies
> > > > next_to_watch.status <0>
> > > > ....
> > > > ....
> > > > Eventually I see in the log file:
> > > >
> > > > NETDEV WATCHDOG: eth1: transmit timed out
> > > > e1000: eth1: e1000_watchdog: NIC Link is Up 1000 Mbps Full Duplex
> > > >
> > > > Interestingly, this timeout-and-reset problem does not occur when
> > > > running my click configuration at the userlevel, but reproduces
> > > > quite
> > > > easily with the kernel module, even when the NIC is working at low
> > > > packet Rx rates. All configuration parameters to the e1000 modules
> > > > are
> > > > at their defaults, and my attempts with parameters suggested in a
> > > > previous post
> > > > (
> > > > https://amsterdam.lcs.mit.edu/pipermail/click/2006-March/004690.html
> )
> > > > for similar problems didn't help.
> > > >
> > > >
> > > > Any pointers to solving this problem are appreciated,
> > > >
> > > > Thanks in advance,
> > > > Srivas.
> > > >
> > > >
> > > > --
> > > > Meet us at
> > > >
> > > > OC&I 2006 and NOC 2006: 10.-13.7.06 at HHI Berlin, Germany
> > > > IFA: 1.-6.9.06, Berlin, Germany
> > > > _______________________________________________
> > > > click mailing list
> > > > click at amsterdam.lcs.mit.edu
> > > > https://amsterdam.lcs.mit.edu/mailman/listinfo/click
> > > >
> >
> > >
> >
> >
> > --
> > Meet us at
> >
> > OC&I 2006 and NOC 2006: 10.-13.7.06 at HHI Berlin, Germany
> > IFA: 1.-6.9.06, Berlin, Germany
> > _______________________________________________
> > click mailing list
> > click at amsterdam.lcs.mit.edu
> > https://amsterdam.lcs.mit.edu/mailman/listinfo/click
> >
>
>
> _______________________________________________
> click mailing list
> click at amsterdam.lcs.mit.edu
> https://amsterdam.lcs.mit.edu/mailman/listinfo/click
>
>
>
>


More information about the click mailing list