[Click] click on 2.6 kernel stability

Beyers Cronje bcronje at gmail.com
Wed Mar 22 10:42:42 EST 2006


Hi Thomas,

It's very easy to duplicate. It happens at very low packet rates less than
100pps. Basically on the Click box I have one console session pinging the
gateway, and a couple of Firefox http sessions. First thing I notice is that
I get "ping sendmsg: No buffer space available", which to me indicates that
packets are not sent from the socket transmit buffer, to confirm this soon
after this I receive the "transmit timed out" message from the kernel on
Eth0.

I dont think the problem is related to the switch or link as when I use
FromDevice instead of PollDevice all works 100%.

Will keep you posted if I pick up anything else.

Beyers

On 3/22/06, Paine, Thomas Asa <PAINETA at uwec.edu> wrote:
>
> Beyers,
>         I'm using the latest source from the CVS (within 2 months).  I
> am using the 5x driver and polling.  All the hardware I've used thus far
> with click has been on Dell.  I do have an appliance coming this week.
> If that works I will post my results to the list as well.
>         What I was seeing with the other cards I mentioned was that if I
> initially slammed a card with, say 50Kpps, out of the gate the nic would
> freak out and basically stop servicing packets.  If I ramped up the
> packet rate over a few seconds, it tended to work (just not trusted, and
> that's the worst feeling in production).
>         Are you only seeing the problem at certain packet rates or data
> rates, or when the card isn't getting enough CPU time (is there loss),
> or anything like that?  I guess what I'm asking is, can it be a
> controlled failure?  What kind of switching hardware is in place here?
> Switchport settings perhaps?  Flow control, duplex, etc...  Just food
> for thought.
>
>
> Thanks,
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>    Thomas Paine (paineta at uwec.edu)
>    University of Wisconsin - Eau Claire
>    garbage foo(garbage g){return(g);}
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>
>
> ________________________________
>
> From: Beyers Cronje [mailto:bcronje at gmail.com]
> Sent: Wednesday, March 22, 2006 8:10 AM
> To: Paine, Thomas Asa
> Cc: Click
> Subject: Re: [Click] click on 2.6 kernel stability
>
>
> Hi Thomas,
>
> Thanks for the reply. I am running a pure Intel 82545GM card connected
> to a 100Mb switch. I used this same card on my old MB running 2.4.26 in
> polling mode with the same e1000-5 click driver with no problems.
> Unfortunately I had to replace my MB and the new SIS661 chipset is not
> supported on the 2.4 kernel.
>
> What version or date of Click source are you using? Are you running the
> E1000-5x driver?
>
> I've come across one possible bug in the e1000-5x driver, in the event
> of a TX timeout the driver's tx timeout routine is called where
> interrupts are enabled again, even though click polling is still
> enabled/active. But I'm struggling to find out why the tx timeout
> happens in the first place.
>
> Using FromDevice works well though, so I'm looking into the polling side
> of things for now.
>
> Thanks
>
> Beyers
>
>
>
>
> On 3/22/06, Paine, Thomas Asa <PAINETA at uwec.edu> wrote:
>
>         Beyers,
>                 I'm running production boxes on 2.6.13.2, patched, with
> no
>         problem (I have run over 500Kpps though them).  I can tell you
> I've seen
>         this kind of problem when I attempt to use a "so called" e1000
> card.
>         Whenever I attempted to use a non-intel(OEM) branded Intel 1000
> that
>         kind of behavior is almost guaranteed at even moderate packet
> rates.  I
>         have had NO issues like that when running true Intel cards,
> specifically
>         I have used 82543 and 82546 chip based cards.
>
>                 One thing I have not done, however, is linked at less
> than 1Gb
>         with these cards, and I see you were connected at 100Mb.  I'm
> not sure
>         if that could introduce any issues.  I would suspect not though.
>
>
>         Thanks,
>         ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>            Thomas Paine (paineta at uwec.edu)
>            University of Wisconsin - Eau Claire
>            garbage foo(garbage g){return(g);}
>         ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>
>
>         -----Original Message-----
>         From: click-bounces at pdos.csail.mit.edu
>         [mailto: click-bounces at pdos.csail.mit.edu
> <mailto:click-bounces at pdos.csail.mit.edu> ] On Behalf Of Beyers Cronje
>         Sent: Tuesday, March 21, 2006 7:46 PM
>         To: Click
>         Subject: [Click] click on 2.6 kernel stability
>
>         Hi everyone,
>
>         Is anyone running a stable click kernel implementation on a 2.6
> kernel?
>         Using current cvs code with e1000-5.x polling driver I managed
> to
>         compile and run on 2.6.13.2 but the system is very unstable. I'm
> running
>         a basic config for testing:
>
>         PollDevice(eth0) -> ToHost;
>         Idle -> ToDevice(eth0);
>
>         Input and output seems to hang every now and again with the odd
> complete
>         system hang. The only error messages I get are loads of the
> following:
>
>         NETDEV WATCHDOG: eth0: transmit timed out
>         e1000: eth0: e1000_watchdog_1: NIC Link is Up 100 Mbps Full
> Duplex
>         NETDEV WATCHDOG: eth0: transmit timed out
>         e1000: eth0: e1000_watchdog_1: NIC Link is Up 100 Mbps Full
> Duplex
>
>         This only occurs when click module is installed, when I unload
> click
>         module everything works fine. ethtool indicates the link is
> always up.
>         Watchdog never actually reports that the link ever went down, so
> could
>         this indicate an irq conflict or race condition of some sort?
> Any ideas
>         on where to begin troubleshooting this?
>
>         Thanks
>
>         Beyers
>         _______________________________________________
>         click mailing list
>         click at amsterdam.lcs.mit.edu
>         https://amsterdam.lcs.mit.edu/mailman/listinfo/click
>
>
>
>


More information about the click mailing list