[Click] e1000 driver timeout with 2.6.x

Adam Greenhalgh a.greenhalgh at cs.ucl.ac.uk
Wed Jul 26 12:20:08 EDT 2006


Max,

I think I might have found a bug in driver, line 4639 in e1000_main.c,
I think needs got--; could you please validate this for me.

    if(!(rx_desc->status & E1000_RXD_STAT_EOP) ||
       (rx_desc->errors & E1000_RXD_ERR_FRAME_ERR_MASK)) {
      rx_desc->status = 0;
      dev_kfree_skb(*skb);
      *skb = NULL;
+      got--;
      continue;
    }

I am also seeing the same tx-time outs when the load is high, but I am
not swapping click in and out. I have seen many messages on the e1000
list about this, so I  think there is a problem with the under lying
driver, it might be good to apply the click patched to the latest 7.x
driver and see how that is shaping up. I'm happy to test and code if
you want me to.

Adam

On 7/6/06, Massimiliano Poletto <maxp at mazunetworks.com> wrote:
> Hi Srivas and Beyers, I've spent some time looking at drivers again recently.
>
> What works best for me at present is a patched version of the 6.1.16.2
> Intel driver (not the 6.3.9-k4 driver that comes with linux
> 2.6.16.13).  I attach the driver sources and patch to this email.  I'm
> using a 2.6.16.22 kernel, but I don't see why .13 should work any less
> well with the driver.
>
> Performance is good, and it is stable across hundreds of
> installs/uninstalls and many hours of testing at full line rate
> offered load.  I sometimes see messages similar to yours (below is an
> example), but they only seem to happen during stress tests when click
> is repeatedly installed/uninstalled at very short intervals:
> e1000: eth2: e1000_clean_tx_irq: Detected Tx Unit Hang
>   TDH                  <b0>
>   TDT                  <b0>
>   next_to_use          <b0>
>   next_to_clean        <9e>
> buffer_info[next_to_clean]
>   dma                  <2c66c040>
>   time_stamp           <0>
>   next_to_watch        <0>
>   jiffies              <82ddae8>
>   next_to_watch.status <0>
>
> I'm trying to get a sourceforge 7.x driver to work, but for now this
> seems at least workable.
>
> Please let me know if you have problems with this driver, or if you
> make other progress yourselves.
>
> Regards,
> max
>
>
> On 7/3/06, Srivas Chennu <chennu at hhi.fhg.de> wrote:
> > Hello Beyers,
> >
> > Thanks a lot for your speedy response. To answer your question regarding
> > the e1000 driver, I've downloaded and tested my configuration with the
> > latest stable release (7.1.9) from sourceforge, and the timeout
> > stubbornly continues to occur with the TSO option disabled.
> >
> > For your reference, the possibly relevant snippet of my click
> > configuration is attached. It uses a click element (onuagent) that I've
> > written to emulate the protocol being tested, which receives and
> > forwards packets between 3 interfaces via a customized priority
> > schedulers.
> >
> > ...
> > FromDevice($rp0, PROMISC true) -> [0]onuagent;
> > onuagent[0] -> priosched0 -> ToDevice($rp0);
> > FromDevice($rp1, PROMISC true) -> [1]onuagent;
> > onuagent[1] -> priosched1 -> ToDevice($rp1);
> > FromDevice($lp, PROMISC true) -> [2]onuagent;
> > onuagent[2] -> priosched2 -> ToDevice($lp);
> > ...
> >
> > I'm currently attempting to find a combination of a kernel (2.4.x or
> > 2.6.x) and a stable e1000 driver version with which I can reliably use
> > FromDevice/PollDevice. Any details of a setup that has worked for you in
> > this regard would be helpful.
> >
> > Thanks in advance,
> > Srivas.
> >
> > On Jul 03, 2006 05:18 PM, Beyers Cronje wrote:
> >
> >
> > >
> > > Hi Srivas,
> > >
> > > This is a problem myself, Adamand a few others have been struggling
> > > with. Strange FromDevice gives you the TX hang, as on my system it
> > > only happens when using PollDevice in certain configurations. If
> > > possible can you post the Click config you are using to duplicate the
> > > hang?
> > >
> > > Adam pointed me to the E1000 dev mailing list on SourceForge and the
> > > TX Hang issue seems to pop up on standard linux (non-click) systems as
> > > well. One possible workaround seems to be to disable tcp segmentation
> > > offloading (TSO), you can do this via 'ethtool -K eth0 tso off', but
> > > seems to work only sometimes ...
> > >
> > > What e1000 driver version are you using? Since you only using
> > > FromDevice have you tried the latest e1000 driver?
> > >
> > > Anyone else also having this problem?
> > >
> > > Beyers
> > >
> > >
> > > On 7/3/06, Srivas Chennu <chennu at hhi.fhg.de> wrote:
> > > > Hello all,
> > > >
> > > > I'm a relatively new click user trying to build and test a link
> > > > layer
> > > > protocol using Click. My test runs used the click kernel module
> > > > built
> > > > from the latest CVS sources. On a patched 2.6.16.13 kernel with an
> > > > original Intel PRO/1000 MT dual port GbE NIC for a click
> > > > configuration
> > > > using FromDevice, the driver abruptly times out during Tx and resets
> > > > with messages like those below:
> > > >
> > > > e1000: eth1: e1000_clean_tx_irq: Detected Tx Unit Hang
> > > > Tx Queue <0>
> > > > TDH <97>
> > > > TDT <9a>
> > > > next_to_use <9a>
> > > > next_to_clean <95>
> > > > buffer_info[next_to_clean]
> > > > time_stamp
> > > > next_to_watch <97>
> > > > jiffies
> > > > next_to_watch.status <0>
> > > > ....
> > > > ....
> > > > Eventually I see in the log file:
> > > >
> > > > NETDEV WATCHDOG: eth1: transmit timed out
> > > > e1000: eth1: e1000_watchdog: NIC Link is Up 1000 Mbps Full Duplex
> > > >
> > > > Interestingly, this timeout-and-reset problem does not occur when
> > > > running my click configuration at the userlevel, but reproduces
> > > > quite
> > > > easily with the kernel module, even when the NIC is working at low
> > > > packet Rx rates. All configuration parameters to the e1000 modules
> > > > are
> > > > at their defaults, and my attempts with parameters suggested in a
> > > > previous post
> > > > (
> > > > https://amsterdam.lcs.mit.edu/pipermail/click/2006-March/004690.html)
> > > > for similar problems didn't help.
> > > >
> > > >
> > > > Any pointers to solving this problem are appreciated,
> > > >
> > > > Thanks in advance,
> > > > Srivas.
> > > >
> > > >
> > > > --
> > > > Meet us at
> > > >
> > > > OC&I 2006 and NOC 2006: 10.-13.7.06 at HHI Berlin, Germany
> > > > IFA: 1.-6.9.06, Berlin, Germany
> > > > _______________________________________________
> > > > click mailing list
> > > > click at amsterdam.lcs.mit.edu
> > > > https://amsterdam.lcs.mit.edu/mailman/listinfo/click
> > > >
> >
> > >
> >
> >
> > --
> > Meet us at
> >
> > OC&I 2006 and NOC 2006: 10.-13.7.06 at HHI Berlin, Germany
> > IFA: 1.-6.9.06, Berlin, Germany
> > _______________________________________________
> > click mailing list
> > click at amsterdam.lcs.mit.edu
> > https://amsterdam.lcs.mit.edu/mailman/listinfo/click
> >
>
>
> _______________________________________________
> click mailing list
> click at amsterdam.lcs.mit.edu
> https://amsterdam.lcs.mit.edu/mailman/listinfo/click
>
>
>
>


More information about the click mailing list