[Click] click on 2.6 kernel stability

Paine, Thomas Asa PAINETA at uwec.edu
Wed Mar 22 10:31:13 EST 2006


Beyers,
	I'm using the latest source from the CVS (within 2 months).  I
am using the 5x driver and polling.  All the hardware I've used thus far
with click has been on Dell.  I do have an appliance coming this week.
If that works I will post my results to the list as well.  
	What I was seeing with the other cards I mentioned was that if I
initially slammed a card with, say 50Kpps, out of the gate the nic would
freak out and basically stop servicing packets.  If I ramped up the
packet rate over a few seconds, it tended to work (just not trusted, and
that's the worst feeling in production).  
	Are you only seeing the problem at certain packet rates or data
rates, or when the card isn't getting enough CPU time (is there loss),
or anything like that?  I guess what I'm asking is, can it be a
controlled failure?  What kind of switching hardware is in place here?
Switchport settings perhaps?  Flow control, duplex, etc...  Just food
for thought.
 

Thanks,
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 
   Thomas Paine (paineta at uwec.edu)
   University of Wisconsin - Eau Claire
   garbage foo(garbage g){return(g);}
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 

________________________________

From: Beyers Cronje [mailto:bcronje at gmail.com] 
Sent: Wednesday, March 22, 2006 8:10 AM
To: Paine, Thomas Asa
Cc: Click
Subject: Re: [Click] click on 2.6 kernel stability


Hi Thomas,

Thanks for the reply. I am running a pure Intel 82545GM card connected
to a 100Mb switch. I used this same card on my old MB running 2.4.26 in
polling mode with the same e1000-5 click driver with no problems.
Unfortunately I had to replace my MB and the new SIS661 chipset is not
supported on the 2.4 kernel. 

What version or date of Click source are you using? Are you running the
E1000-5x driver? 

I've come across one possible bug in the e1000-5x driver, in the event
of a TX timeout the driver's tx timeout routine is called where
interrupts are enabled again, even though click polling is still
enabled/active. But I'm struggling to find out why the tx timeout
happens in the first place. 

Using FromDevice works well though, so I'm looking into the polling side
of things for now.

Thanks

Beyers




On 3/22/06, Paine, Thomas Asa <PAINETA at uwec.edu> wrote: 

	Beyers,
	        I'm running production boxes on 2.6.13.2, patched, with
no
	problem (I have run over 500Kpps though them).  I can tell you
I've seen
	this kind of problem when I attempt to use a "so called" e1000
card. 
	Whenever I attempted to use a non-intel(OEM) branded Intel 1000
that
	kind of behavior is almost guaranteed at even moderate packet
rates.  I
	have had NO issues like that when running true Intel cards,
specifically 
	I have used 82543 and 82546 chip based cards.
	
	        One thing I have not done, however, is linked at less
than 1Gb
	with these cards, and I see you were connected at 100Mb.  I'm
not sure
	if that could introduce any issues.  I would suspect not though.

	
	Thanks,
	~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
	   Thomas Paine (paineta at uwec.edu)
	   University of Wisconsin - Eau Claire
	   garbage foo(garbage g){return(g);} 
	~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
	
	
	-----Original Message-----
	From: click-bounces at pdos.csail.mit.edu
	[mailto: click-bounces at pdos.csail.mit.edu
<mailto:click-bounces at pdos.csail.mit.edu> ] On Behalf Of Beyers Cronje
	Sent: Tuesday, March 21, 2006 7:46 PM
	To: Click
	Subject: [Click] click on 2.6 kernel stability
	
	Hi everyone,
	
	Is anyone running a stable click kernel implementation on a 2.6
kernel?
	Using current cvs code with e1000-5.x polling driver I managed
to
	compile and run on 2.6.13.2 but the system is very unstable. I'm
running
	a basic config for testing:
	
	PollDevice(eth0) -> ToHost;
	Idle -> ToDevice(eth0);
	
	Input and output seems to hang every now and again with the odd
complete
	system hang. The only error messages I get are loads of the
following:
	
	NETDEV WATCHDOG: eth0: transmit timed out
	e1000: eth0: e1000_watchdog_1: NIC Link is Up 100 Mbps Full
Duplex
	NETDEV WATCHDOG: eth0: transmit timed out
	e1000: eth0: e1000_watchdog_1: NIC Link is Up 100 Mbps Full
Duplex 
	
	This only occurs when click module is installed, when I unload
click
	module everything works fine. ethtool indicates the link is
always up.
	Watchdog never actually reports that the link ever went down, so
could 
	this indicate an irq conflict or race condition of some sort?
Any ideas
	on where to begin troubleshooting this?
	
	Thanks
	
	Beyers
	_______________________________________________
	click mailing list
	click at amsterdam.lcs.mit.edu
	https://amsterdam.lcs.mit.edu/mailman/listinfo/click
	





More information about the click mailing list