[Click] click on 2.6 kernel stability
Beyers Cronje
bcronje at gmail.com
Wed Mar 22 10:42:42 EST 2006
Hi Thomas,
It's very easy to duplicate. It happens at very low packet rates less than
100pps. Basically on the Click box I have one console session pinging the
gateway, and a couple of Firefox http sessions. First thing I notice is that
I get "ping sendmsg: No buffer space available", which to me indicates that
packets are not sent from the socket transmit buffer, to confirm this soon
after this I receive the "transmit timed out" message from the kernel on
Eth0.
I dont think the problem is related to the switch or link as when I use
FromDevice instead of PollDevice all works 100%.
Will keep you posted if I pick up anything else.
Beyers
On 3/22/06, Paine, Thomas Asa <PAINETA at uwec.edu> wrote:
>
> Beyers,
> I'm using the latest source from the CVS (within 2 months). I
> am using the 5x driver and polling. All the hardware I've used thus far
> with click has been on Dell. I do have an appliance coming this week.
> If that works I will post my results to the list as well.
> What I was seeing with the other cards I mentioned was that if I
> initially slammed a card with, say 50Kpps, out of the gate the nic would
> freak out and basically stop servicing packets. If I ramped up the
> packet rate over a few seconds, it tended to work (just not trusted, and
> that's the worst feeling in production).
> Are you only seeing the problem at certain packet rates or data
> rates, or when the card isn't getting enough CPU time (is there loss),
> or anything like that? I guess what I'm asking is, can it be a
> controlled failure? What kind of switching hardware is in place here?
> Switchport settings perhaps? Flow control, duplex, etc... Just food
> for thought.
>
>
> Thanks,
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> Thomas Paine (paineta at uwec.edu)
> University of Wisconsin - Eau Claire
> garbage foo(garbage g){return(g);}
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>
>
> ________________________________
>
> From: Beyers Cronje [mailto:bcronje at gmail.com]
> Sent: Wednesday, March 22, 2006 8:10 AM
> To: Paine, Thomas Asa
> Cc: Click
> Subject: Re: [Click] click on 2.6 kernel stability
>
>
> Hi Thomas,
>
> Thanks for the reply. I am running a pure Intel 82545GM card connected
> to a 100Mb switch. I used this same card on my old MB running 2.4.26 in
> polling mode with the same e1000-5 click driver with no problems.
> Unfortunately I had to replace my MB and the new SIS661 chipset is not
> supported on the 2.4 kernel.
>
> What version or date of Click source are you using? Are you running the
> E1000-5x driver?
>
> I've come across one possible bug in the e1000-5x driver, in the event
> of a TX timeout the driver's tx timeout routine is called where
> interrupts are enabled again, even though click polling is still
> enabled/active. But I'm struggling to find out why the tx timeout
> happens in the first place.
>
> Using FromDevice works well though, so I'm looking into the polling side
> of things for now.
>
> Thanks
>
> Beyers
>
>
>
>
> On 3/22/06, Paine, Thomas Asa <PAINETA at uwec.edu> wrote:
>
> Beyers,
> I'm running production boxes on 2.6.13.2, patched, with
> no
> problem (I have run over 500Kpps though them). I can tell you
> I've seen
> this kind of problem when I attempt to use a "so called" e1000
> card.
> Whenever I attempted to use a non-intel(OEM) branded Intel 1000
> that
> kind of behavior is almost guaranteed at even moderate packet
> rates. I
> have had NO issues like that when running true Intel cards,
> specifically
> I have used 82543 and 82546 chip based cards.
>
> One thing I have not done, however, is linked at less
> than 1Gb
> with these cards, and I see you were connected at 100Mb. I'm
> not sure
> if that could introduce any issues. I would suspect not though.
>
>
> Thanks,
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> Thomas Paine (paineta at uwec.edu)
> University of Wisconsin - Eau Claire
> garbage foo(garbage g){return(g);}
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>
>
> -----Original Message-----
> From: click-bounces at pdos.csail.mit.edu
> [mailto: click-bounces at pdos.csail.mit.edu
> <mailto:click-bounces at pdos.csail.mit.edu> ] On Behalf Of Beyers Cronje
> Sent: Tuesday, March 21, 2006 7:46 PM
> To: Click
> Subject: [Click] click on 2.6 kernel stability
>
> Hi everyone,
>
> Is anyone running a stable click kernel implementation on a 2.6
> kernel?
> Using current cvs code with e1000-5.x polling driver I managed
> to
> compile and run on 2.6.13.2 but the system is very unstable. I'm
> running
> a basic config for testing:
>
> PollDevice(eth0) -> ToHost;
> Idle -> ToDevice(eth0);
>
> Input and output seems to hang every now and again with the odd
> complete
> system hang. The only error messages I get are loads of the
> following:
>
> NETDEV WATCHDOG: eth0: transmit timed out
> e1000: eth0: e1000_watchdog_1: NIC Link is Up 100 Mbps Full
> Duplex
> NETDEV WATCHDOG: eth0: transmit timed out
> e1000: eth0: e1000_watchdog_1: NIC Link is Up 100 Mbps Full
> Duplex
>
> This only occurs when click module is installed, when I unload
> click
> module everything works fine. ethtool indicates the link is
> always up.
> Watchdog never actually reports that the link ever went down, so
> could
> this indicate an irq conflict or race condition of some sort?
> Any ideas
> on where to begin troubleshooting this?
>
> Thanks
>
> Beyers
> _______________________________________________
> click mailing list
> click at amsterdam.lcs.mit.edu
> https://amsterdam.lcs.mit.edu/mailman/listinfo/click
>
>
>
>
More information about the click
mailing list