[Click] tx timeout fix

Beyers Cronje bcronje at gmail.com
Wed Sep 6 11:20:17 EDT 2006


Hi Srivas,

The 'NETDEV WATCHDOG: eth0: transmit timed out' comes from within Linux,
iirc sh_generic.c or something. The patch Adam posted does not address this
issue.

Where 'eth0: e1000_clean_tx_irq: Detected Tx Unit Hang' is inside the e1000
driver. Before the patch whenever click sends a packet the internal e1000
timeout value was 0. So if you see this error with a 'time_stamp <0>' field
in the output you need the patch.

But there is definately still an issue somewhere with the driver, not sure
if it's linux, e1000 or click related.

Beyers


On 9/6/06, Srivas Chennu <chennu at hhi.fhg.de> wrote:
>
> Hello Adam,
>
> On combining your patch for Max's 6.1.16.2.DB driver with the changes in
> the Click code as suggested by Jason Park, I'm seeing relatively
> acceptable performance with respect to the driver timeout problem. The
> test machines I used run a click-patched 2.6.16.13 SMP kernel, and are
> equipped with a combination of an onboard 82547GI Gigabit Ethernet
> Controller and an add-on network card based on the 82546GB Gigabit
> Ethernet Controller. For my tests, I setup a 100% bidirectional
> utilization of both the interfaces, involving a non-trivial click router
> configuration, Though I did encounter timeout errors like the ones
> below, there were relatively infrequent and were quickly recovered from.
>
> NETDEV WATCHDOG: eth0: transmit timed out
>
> OR
>
> eth0: e1000_clean_tx_irq: Detected Tx Unit Hang
> TDH <5>
> TDT <5>
> next_to_use <5>
> next_to_clean <1>
> buffer_info[next_to_clean]
> dma <1547f040>
> time_stamp <4a1be6>
> next_to_watch <0>
> jiffies <4a20af>
> next_to_watch.status <0>
>
>
> I'm not sure whether there is a relevant difference between these
> errors. Perhaps someone could throw some light on that? Also, I'm
> guessing your fix will get into any new polling-enabled e1000 driver
> that is added to the Click distribution?
>
> Thanks in advance,
> Srivas.
>
> On Sep 06, 2006 02:32 PM, Beyers Cronje wrote:
>
> >Adam,
> >
> >Let me know when the new driver is released and I'll have a crack at
> >porting
> >Max's polling driver.
> >
> >Beyers
> >
> >On 9/6/06, Adam Greenhalgh wrote:
> >>
> >>Another thing that would be useful to know is what chipset the e1000
> >>that has problems is using since it is almost a weekly occurance that
> >>a TX hang for one card or another gets reported to the netdev and
> >>e1000 lists. From these lists it would seem that many of the hangs
> >>have been fixed by the folks at intel, a new release of the 7 series
> >>driver is occuring soon, so perhaps it is time to upgrade the click
> >>driver.
> >>
> >>Adam
> >>
> >>On 9/6/06, Jason Park wrote:
> >>>AS NOTED BEFORE, I suggest him to turn off packet split function not
> >>>skb_copy.
> >>>skb_copy is my environment and I mentioned it as referencing.
> >>>
> >>>Jason.
> >>>-----Original Message-----
> >>>From: todd lewis [mailto:tgl2 at yahoo.com]
> >>>Sent: Wednesday, September 06, 2006 12:37 AM
> >>>To: Jason Park; 'Srivas Chennu'
> >>>Cc: click at pdos.csail.mit.edu
> >>>Subject: Re: [Click] tx timeout fix
> >>>
> >>>As noted before, replacing skb_clone with skb_copy amounts to fixing
> >>>a
> >>>broken door by burning the
> >>>house down.
> >>>
> >>>Does anyone have success under real bidirectional load without
> >>>copying
> >>every
> >>>packet? If so, with
> >>>what configuration?
> >>>
> >>>--- Jason Park wrote:
> >>>
> >>>>Hi Srivas
> >>>>
> >>>>What e1000 device are you using?
> >>>>If your device support PACKET_SPLIT and you turned on, you should
> >>>>turn
> >>off
> >>>>it.
> >>>>Make sure un-define CONFIG_E1000_PACKET_SPLIT on 6.1.16.62.DB
> >>>>driver.
> >>>>It works well for me with disable packet split and replacing
> >>>>skb_clone
> >>to
> >>>>skb_copy.
> >>>>
> >>>>Jason
> >>>>-----Original Message-----
> >>>>From: click-bounces at pdos.csail.mit.edu
> >>>>[mailto:click-bounces at pdos.csail.mit.edu] On Behalf Of Srivas Chennu
> >>>>Sent: Tuesday, September 05, 2006 5:37 PM
> >>>>To: Click
> >>>>Subject: Re: [Click] tx timeout fix
> >>>>
> >>>>Hello Adam and Beyers,
> >>>>
> >>>>I've been lately testing the timeout fixes you had posted on a
> >>>>click-patched 2.6.16.13 kernel. The results therefrom still seem to
> >>show
> >>>>stability problems. Though the timeouts don't occur predictably or
> >>>>as
> >>>>often as before, I still encounter them and kernel panics randomly,
> >>and
> >>>>with a higher probability when testing with high loads. Notably, I
> >>>>see
> >>>>these problems well pronounced with bidirectional (full-duplex)
> >>>>operation of the driver, both with FromDevice and PollDevice. I've
> >>>>tested the 5.x driver patch as well as the 6.1.16.62.DB version
> >>>>patch,
> >>>>and seen similar results.
> >>>>
> >>>>Please do let me know of the details of a stable 2.6.x configuration
> >>>>that you were able to set up using these patches. Further, any idea
> >>>>if
> >>>>and when the fixes will eventually get in to the main source tree of
> >>the
> >>>>e1000 driver on sourceforge?
> >>>>
> >>>>Thanks a bunch in advance,
> >>>>Srivas.
> >>>>
> >>>>On Jul 26, 2006 09:06 PM, Adam Greenhalgh wrote:
> >>>>
> >>>>>hi
> >>>>>
> >>>>>beyers and i have been hacking and we have fixed the tx timeout
> >>>>>bug.
> >>>>>basically the time_stamp is not being set in the buffer and when
> >>linux
> >>>>>sends packets too, it encounters a buffer with a time stamp of 0
> >>>>>and
> >>>>>throws an erorr. Attached are two patches, one against cvs ,
> >>>>>driver-5.x-e1000_main.patch , and one against Max's 6.1.16.2.DB
> >>>>>driver, driver-6.1.16.2.DB-e1000_main.patch . Neither patch has
> >>>>>been
> >>>>>very heavily tested yet, but neither does anything special.
> >>>>>
> >>>>>enjoy
> >>>>>
> >>>>>adam
> >>>>
> >>>>--
> >>>>Visit us at
> >>>>IFA Berlin, 01.-06. September 2006
> >>>>and
> >>>>IBC Amsterdam, NL, 08.-12.September 2006
> >>>>_______________________________________________
> >>>>click mailing list
> >>>>click at amsterdam.lcs.mit.edu
> >>>>https://amsterdam.lcs.mit.edu/mailman/listinfo/click
> >>>>
> >>>>_______________________________________________
> >>>>click mailing list
> >>>>click at amsterdam.lcs.mit.edu
> >>>>https://amsterdam.lcs.mit.edu/mailman/listinfo/click
> >>>>
> >>>
> >>>
> >>>__________________________________________________
> >>>Do You Yahoo!?
> >>>Tired of spam? Yahoo! Mail has the best spam protection around
> >>>http://mail.yahoo.com
> >>>
> >>>_______________________________________________
> >>>click mailing list
> >>>click at amsterdam.lcs.mit.edu
> >>>https://amsterdam.lcs.mit.edu/mailman/listinfo/click
> >>>
> >>_______________________________________________
> >>click mailing list
> >>click at amsterdam.lcs.mit.edu
> >>https://amsterdam.lcs.mit.edu/mailman/listinfo/click
> >>
> >_______________________________________________
> >click mailing list
> >click at amsterdam.lcs.mit.edu
> >https://amsterdam.lcs.mit.edu/mailman/listinfo/click
> >
>
> --
> Visit us at
> IFA  Berlin, 01.-06. September 2006
> and
> IBC Amsterdam, NL, 08.-12.September 2006
> _______________________________________________
> click mailing list
> click at amsterdam.lcs.mit.edu
> https://amsterdam.lcs.mit.edu/mailman/listinfo/click
>


More information about the click mailing list