[Click] click on 2.6 kernel stability

Paine, Thomas Asa PAINETA at uwec.edu
Wed Mar 22 10:59:42 EST 2006


The only other thing I could think of coming into play could be the
descriptors or other paramters.  If this is worth anything, here are the
parameters I use to load the nics kernel module, etc...  (if you only
have one nic then you would use just # not #,#)
 
/sbin/modprobe e1000 FlowControl=0,0 RxIntDelay=256,256
TxIntDelay=256,256 RxDescriptors=256,256 TxDescriptors=256,256
/sbin/ifconfig eth1 up promisc txqueuelen 1000
/sbin/ifconfig eth2 up promisc txqueuelen 1000
 

Thanks,
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 
   Thomas Paine (paineta at uwec.edu)
   University of Wisconsin - Eau Claire
   garbage foo(garbage g){return(g);}
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 

________________________________

From: Beyers Cronje [mailto:bcronje at gmail.com] 
Sent: Wednesday, March 22, 2006 9:43 AM
To: Paine, Thomas Asa
Cc: Click
Subject: Re: [Click] click on 2.6 kernel stability


Hi Thomas,

It's very easy to duplicate. It happens at very low packet rates less
than 100pps. Basically on the Click box I have one console session
pinging the gateway, and a couple of Firefox http sessions. First thing
I notice is that I get "ping sendmsg: No buffer space available", which
to me indicates that packets are not sent from the socket transmit
buffer, to confirm this soon after this I receive the "transmit timed
out" message from the kernel on Eth0.

I dont think the problem is related to the switch or link as when I use
FromDevice instead of PollDevice all works 100%. 

Will keep you posted if I pick up anything else.

Beyers


On 3/22/06, Paine, Thomas Asa <PAINETA at uwec.edu> wrote: 

	Beyers,
	        I'm using the latest source from the CVS (within 2
months).  I
	am using the 5x driver and polling.  All the hardware I've used
thus far
	with click has been on Dell.  I do have an appliance coming this
week. 
	If that works I will post my results to the list as well.
	        What I was seeing with the other cards I mentioned was
that if I
	initially slammed a card with, say 50Kpps, out of the gate the
nic would
	freak out and basically stop servicing packets.  If I ramped up
the 
	packet rate over a few seconds, it tended to work (just not
trusted, and
	that's the worst feeling in production).
	        Are you only seeing the problem at certain packet rates
or data
	rates, or when the card isn't getting enough CPU time (is there
loss), 
	or anything like that?  I guess what I'm asking is, can it be a
	controlled failure?  What kind of switching hardware is in place
here?
	Switchport settings perhaps?  Flow control, duplex, etc...  Just
food
	for thought. 
	
	
	Thanks,
	~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
	   Thomas Paine (paineta at uwec.edu)
	   University of Wisconsin - Eau Claire
	   garbage foo(garbage g){return(g);} 
	~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
	
	
	________________________________
	
	From: Beyers Cronje [mailto:bcronje at gmail.com]
	Sent: Wednesday, March 22, 2006 8:10 AM 
	To: Paine, Thomas Asa
	Cc: Click
	Subject: Re: [Click] click on 2.6 kernel stability
	
	
	Hi Thomas,
	
	Thanks for the reply. I am running a pure Intel 82545GM card
connected
	to a 100Mb switch. I used this same card on my old MB running
2.4.26 in
	polling mode with the same e1000-5 click driver with no
problems.
	Unfortunately I had to replace my MB and the new SIS661 chipset
is not
	supported on the 2.4 kernel.
	
	What version or date of Click source are you using? Are you
running the 
	E1000-5x driver?
	
	I've come across one possible bug in the e1000-5x driver, in the
event
	of a TX timeout the driver's tx timeout routine is called where
	interrupts are enabled again, even though click polling is still

	enabled/active. But I'm struggling to find out why the tx
timeout
	happens in the first place.
	
	Using FromDevice works well though, so I'm looking into the
polling side
	of things for now.
	
	Thanks
	
	Beyers
	
	
	
	
	On 3/22/06, Paine, Thomas Asa <PAINETA at uwec.edu> wrote:
	
	        Beyers,
	                I'm running production boxes on 2.6.13.2,
patched, with
	no
	        problem (I have run over 500Kpps though them).  I can
tell you
	I've seen
	        this kind of problem when I attempt to use a "so called"
e1000 
	card.
	        Whenever I attempted to use a non-intel(OEM) branded
Intel 1000
	that
	        kind of behavior is almost guaranteed at even moderate
packet
	rates.  I
	        have had NO issues like that when running true Intel
cards, 
	specifically
	        I have used 82543 and 82546 chip based cards.
	
	                One thing I have not done, however, is linked at
less
	than 1Gb
	        with these cards, and I see you were connected at 100Mb.
I'm
	not sure
	        if that could introduce any issues.  I would suspect not
though. 
	
	
	        Thanks,
	        ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
	           Thomas Paine (paineta at uwec.edu)
	           University of Wisconsin - Eau Claire
	           garbage foo(garbage g){return(g);}
	        ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
	
	
	        -----Original Message-----
	        From: click-bounces at pdos.csail.mit.edu
<mailto:click-bounces at pdos.csail.mit.edu> 
	        [mailto: click-bounces at pdos.csail.mit.edu
	<mailto:click-bounces at pdos.csail.mit.edu> ] On Behalf Of Beyers
Cronje 
	        Sent: Tuesday, March 21, 2006 7:46 PM
	        To: Click
	        Subject: [Click] click on 2.6 kernel stability
	
	        Hi everyone,
	
	        Is anyone running a stable click kernel implementation
on a 2.6
	kernel?
	        Using current cvs code with e1000-5.x polling driver I
managed
	to
	        compile and run on 2.6.13.2 but the system is very
unstable. I'm
	running
	        a basic config for testing: 
	
	        PollDevice(eth0) -> ToHost;
	        Idle -> ToDevice(eth0);
	
	        Input and output seems to hang every now and again with
the odd
	complete
	        system hang. The only error messages I get are loads of
the 
	following:
	
	        NETDEV WATCHDOG: eth0: transmit timed out
	        e1000: eth0: e1000_watchdog_1: NIC Link is Up 100 Mbps
Full
	Duplex
	        NETDEV WATCHDOG: eth0: transmit timed out
	        e1000: eth0: e1000_watchdog_1: NIC Link is Up 100 Mbps
Full 
	Duplex
	
	        This only occurs when click module is installed, when I
unload
	click
	        module everything works fine. ethtool indicates the link
is
	always up.
	        Watchdog never actually reports that the link ever went
down, so 
	could
	        this indicate an irq conflict or race condition of some
sort?
	Any ideas
	        on where to begin troubleshooting this?
	
	        Thanks
	
	        Beyers
	        _______________________________________________ 
	        click mailing list
	        click at amsterdam.lcs.mit.edu
	        https://amsterdam.lcs.mit.edu/mailman/listinfo/click
<https://amsterdam.lcs.mit.edu/mailman/listinfo/click> 
	
	
	
	





More information about the click mailing list