[Click] tx timeout fix

Srivas Chennu chennu at hhi.fhg.de
Thu Sep 21 06:02:03 EDT 2006


Hello all,

I can confirm that with the latest click CVS sources I am able to set up
a click configuration with 100% utilized bidirectional links. Using
PollDevice with the e1000-6.x driver for the 82547GI and 82546GB
controllers, I don't see the timeout errors anymore, and my router
appears to be quite stable.

One point though: The e1000_main.c file in the new e1000-6.x driver
doesn't seem to contain the Click kernel patch for the file. I had to
manually change 'netif_receive_skb(skb)' to 'netif_receive_skb(skb,
skb->protocol, 0)' near lines 3662 and 3802, in order to get the driver
to compile. Perhaps this change could be made in the checked in version
of the file?

Many thanks and regards,
Srivas.

On Sep 19, 2006 04:33 AM, Jason Park wrote:

>Dear Eddie.
>
>It's my pleasure.
>Yes, skb_copy might not be needed anymore.
>After patching, I have been testing it for some days with skb_clone and
>it's
>working nicely.
>Before patching, the test has not worked without replacing skb_clone to
>skb_copy.
>
>Jason.
>
>-----Original Message-----
>From: Eddie Kohler [mailto:kohler at cs.ucla.edu]
>Sent: Tuesday, September 19, 2006 11:04 AM
>To: Jason Park
>Cc: 'Adam Greenhalgh'; click at amsterdam.lcs.mit.edu
>Subject: Re: [Click] tx timeout fix
>
>Jason,
>
>Thanks very much for this patch! This looks good. I've applied a
>version
>of
>it to our code (and produced a new Linux patch to include your patch to
>skbuff.c). Adam/Beyers, does this take care of the remaining timeouts?
>Jason, does this patch mean that you no longer need to use skb_copy()
>instead
>of skb_clone()?
>
>Eddie
>
>
>Jason Park wrote:
>>Um.
>>Sorry for I missed something.
>>Here I am re-attaching patch files.
>>packet.cc.patch was modified trivially and fixed skb_recycle in
>>skbuff.c
>for
>>PollDevice.
>>
>>Jason.
>>-----Original Message-----
>>From: click-bounces at pdos.csail.mit.edu
>>[mailto:click-bounces at pdos.csail.mit.edu] On Behalf Of Jason Park
>>Sent: Friday, September 15, 2006 9:10 PM
>>To: 'Adam Greenhalgh'
>>Cc: click at pdos.csail.mit.edu
>>Subject: Re: [Click] tx timeout fix
>>
>>Dear click guys.
>>
>>I hacked the click for TX timeout and found something.
>>For now lib/packet.cc expensive_uniqueify() function does not
>>initialize
>all
>>of skb_shinfo about tcp segment offloading.
>>As a result, It made TSO supported e1000 device to TX timeout.
>>(hw.mac_type >= e1000_82544 && hw.mac_type != 82547)
>>I think it caused a confusing for click users about stability of click
>>on
>>e1000.
>>Here I attached patch. If someone test, please let me know your test
>result.
>>(It should work with skb_clone() not replaced skb_copy())
>>
>>Thanks in advance.
>>
>>Jason.
>>-----Original Message-----
>>From: adam.greenhalgh at gmail.com [mailto:adam.greenhalgh at gmail.com] On
>Behalf
>>Of Adam Greenhalgh
>>Sent: Wednesday, September 06, 2006 5:12 PM
>>To: Jason Park
>>Cc: todd lewis; click at pdos.csail.mit.edu
>>Subject: Re: [Click] tx timeout fix
>>
>>Another thing that would be useful to know is what chipset the e1000
>>that has problems is using since it is almost a weekly occurance that
>>a TX hang for one card or another gets reported to the netdev and
>>e1000 lists. From these lists it would seem that many of the hangs
>>have been fixed by the folks at intel, a new release of the 7 series
>>driver is occuring soon, so perhaps it is time to upgrade the click
>>driver.
>>
>>Adam
>>
>>On 9/6/06, Jason Park wrote:
>>>AS NOTED BEFORE, I suggest him to turn off packet split function not
>>>skb_copy.
>>>skb_copy is my environment and I mentioned it as referencing.
>>>
>>>Jason.
>>>-----Original Message-----
>>>From: todd lewis [mailto:tgl2 at yahoo.com]
>>>Sent: Wednesday, September 06, 2006 12:37 AM
>>>To: Jason Park; 'Srivas Chennu'
>>>Cc: click at pdos.csail.mit.edu
>>>Subject: Re: [Click] tx timeout fix
>>>
>>>As noted before, replacing skb_clone with skb_copy amounts to fixing
>>>a
>>>broken door by burning the
>>>house down.
>>>
>>>Does anyone have success under real bidirectional load without
>>>copying
>>every
>>>packet? If so, with
>>>what configuration?
>>>
>>>--- Jason Park wrote:
>>>
>>>>Hi Srivas
>>>>
>>>>What e1000 device are you using?
>>>>If your device support PACKET_SPLIT and you turned on, you should
>>>>turn
>>off
>>>>it.
>>>>Make sure un-define CONFIG_E1000_PACKET_SPLIT on 6.1.16.62.DB
>>>>driver.
>>>>It works well for me with disable packet split and replacing
>>>>skb_clone
>>to
>>>>skb_copy.
>>>>
>>>>Jason
>>>>-----Original Message-----
>>>>From: click-bounces at pdos.csail.mit.edu
>>>>[mailto:click-bounces at pdos.csail.mit.edu] On Behalf Of Srivas Chennu
>>>>Sent: Tuesday, September 05, 2006 5:37 PM
>>>>To: Click
>>>>Subject: Re: [Click] tx timeout fix
>>>>
>>>>Hello Adam and Beyers,
>>>>
>>>>I've been lately testing the timeout fixes you had posted on a
>>>>click-patched 2.6.16.13 kernel. The results therefrom still seem to
>>>>show
>>>>stability problems. Though the timeouts don't occur predictably or
>>>>as
>>>>often as before, I still encounter them and kernel panics randomly,
>>>>and
>>>>with a higher probability when testing with high loads. Notably, I
>>>>see
>>>>these problems well pronounced with bidirectional (full-duplex)
>>>>operation of the driver, both with FromDevice and PollDevice. I've
>>>>tested the 5.x driver patch as well as the 6.1.16.62.DB version
>>>>patch,
>>>>and seen similar results.
>>>>
>>>>Please do let me know of the details of a stable 2.6.x configuration
>>>>that you were able to set up using these patches. Further, any idea
>>>>if
>>>>and when the fixes will eventually get in to the main source tree of
>>>>the
>>>>e1000 driver on sourceforge?
>>>>
>>>>Thanks a bunch in advance,
>>>>Srivas.
>>>>
>>>>On Jul 26, 2006 09:06 PM, Adam Greenhalgh wrote:
>>>>
>>>>>hi
>>>>>
>>>>>beyers and i have been hacking and we have fixed the tx timeout
>>>>>bug.
>>>>>basically the time_stamp is not being set in the buffer and when
>>>>>linux
>>>>>sends packets too, it encounters a buffer with a time stamp of 0
>>>>>and
>>>>>throws an erorr. Attached are two patches, one against cvs ,
>>>>>driver-5.x-e1000_main.patch , and one against Max's 6.1.16.2.DB
>>>>>driver, driver-6.1.16.2.DB-e1000_main.patch . Neither patch has
>>>>>been
>>>>>very heavily tested yet, but neither does anything special.
>>>>>
>>>>>enjoy
>>>>>
>>>>>adam
>>>>--
>>>>Visit us at
>>>>IFA Berlin, 01.-06. September 2006
>>>>and
>>>>IBC Amsterdam, NL, 08.-12.September 2006
>>>>_______________________________________________
>>>>click mailing list
>>>>click at amsterdam.lcs.mit.edu
>>>>https://amsterdam.lcs.mit.edu/mailman/listinfo/click
>>>>
>>>>_______________________________________________
>>>>click mailing list
>>>>click at amsterdam.lcs.mit.edu
>>>>https://amsterdam.lcs.mit.edu/mailman/listinfo/click
>>>>
>>>
>>>__________________________________________________
>>>Do You Yahoo!?
>>>Tired of spam? Yahoo! Mail has the best spam protection around
>>>http://mail.yahoo.com
>>>
>>>_______________________________________________
>>>click mailing list
>>>click at amsterdam.lcs.mit.edu
>>>https://amsterdam.lcs.mit.edu/mailman/listinfo/click
>>>
>>>

>>>>>>------------------------------------------------------------------------
>>>
>>>_______________________________________________
>>>click mailing list
>>>click at amsterdam.lcs.mit.edu
>>>https://amsterdam.lcs.mit.edu/mailman/listinfo/click
>
>_______________________________________________
>click mailing list
>click at amsterdam.lcs.mit.edu
>https://amsterdam.lcs.mit.edu/mailman/listinfo/click
>

--
Visit us at
ECOC 2006, Cannes (F), September 25th - 28th, 2006, booth 652 


More information about the click mailing list