[Click] click on 2.6 kernel stability

rchertov@purdue.edu rchertov at purdue.edu
Wed Mar 22 21:51:01 EST 2006


Quoting "Paine, Thomas Asa" <PAINETA at uwec.edu>:

> The only other thing I could think of coming into play could be the
> descriptors or other paramters.  If this is worth anything, here are the
> parameters I use to load the nics kernel module, etc...  (if you only
> have one nic then you would use just # not #,#)
>  
> /sbin/modprobe e1000 FlowControl=0,0 RxIntDelay=256,256
> TxIntDelay=256,256 RxDescriptors=256,256 TxDescriptors=256,256
> /sbin/ifconfig eth1 up promisc txqueuelen 1000
> /sbin/ifconfig eth2 up promisc txqueuelen 1000

Just a side note.  You should be able to use "ethtool" to make these changes
without having to unload/reload the driver module.

Roman

>  
> 
> Thanks,
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 
>    Thomas Paine (paineta at uwec.edu)
>    University of Wisconsin - Eau Claire
>    garbage foo(garbage g){return(g);}
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>  
> 
> ________________________________
> 
> From: Beyers Cronje [mailto:bcronje at gmail.com] 
> Sent: Wednesday, March 22, 2006 9:43 AM
> To: Paine, Thomas Asa
> Cc: Click
> Subject: Re: [Click] click on 2.6 kernel stability
> 
> 
> Hi Thomas,
> 
> It's very easy to duplicate. It happens at very low packet rates less
> than 100pps. Basically on the Click box I have one console session
> pinging the gateway, and a couple of Firefox http sessions. First thing
> I notice is that I get "ping sendmsg: No buffer space available", which
> to me indicates that packets are not sent from the socket transmit
> buffer, to confirm this soon after this I receive the "transmit timed
> out" message from the kernel on Eth0.
> 
> I dont think the problem is related to the switch or link as when I use
> FromDevice instead of PollDevice all works 100%. 
> 
> Will keep you posted if I pick up anything else.
> 
> Beyers
> 
> 
> On 3/22/06, Paine, Thomas Asa <PAINETA at uwec.edu> wrote: 
> 
> 	Beyers,
> 	        I'm using the latest source from the CVS (within 2
> months).  I
> 	am using the 5x driver and polling.  All the hardware I've used
> thus far
> 	with click has been on Dell.  I do have an appliance coming this
> week. 
> 	If that works I will post my results to the list as well.
> 	        What I was seeing with the other cards I mentioned was
> that if I
> 	initially slammed a card with, say 50Kpps, out of the gate the
> nic would
> 	freak out and basically stop servicing packets.  If I ramped up
> the 
> 	packet rate over a few seconds, it tended to work (just not
> trusted, and
> 	that's the worst feeling in production).
> 	        Are you only seeing the problem at certain packet rates
> or data
> 	rates, or when the card isn't getting enough CPU time (is there
> loss), 
> 	or anything like that?  I guess what I'm asking is, can it be a
> 	controlled failure?  What kind of switching hardware is in place
> here?
> 	Switchport settings perhaps?  Flow control, duplex, etc...  Just
> food
> 	for thought. 
> 	
> 	
> 	Thanks,
> 	~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> 	   Thomas Paine (paineta at uwec.edu)
> 	   University of Wisconsin - Eau Claire
> 	   garbage foo(garbage g){return(g);} 
> 	~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> 	
> 	
> 	________________________________
> 	
> 	From: Beyers Cronje [mailto:bcronje at gmail.com]
> 	Sent: Wednesday, March 22, 2006 8:10 AM 
> 	To: Paine, Thomas Asa
> 	Cc: Click
> 	Subject: Re: [Click] click on 2.6 kernel stability
> 	
> 	
> 	Hi Thomas,
> 	
> 	Thanks for the reply. I am running a pure Intel 82545GM card
> connected
> 	to a 100Mb switch. I used this same card on my old MB running
> 2.4.26 in
> 	polling mode with the same e1000-5 click driver with no
> problems.
> 	Unfortunately I had to replace my MB and the new SIS661 chipset
> is not
> 	supported on the 2.4 kernel.
> 	
> 	What version or date of Click source are you using? Are you
> running the 
> 	E1000-5x driver?
> 	
> 	I've come across one possible bug in the e1000-5x driver, in the
> event
> 	of a TX timeout the driver's tx timeout routine is called where
> 	interrupts are enabled again, even though click polling is still
> 
> 	enabled/active. But I'm struggling to find out why the tx
> timeout
> 	happens in the first place.
> 	
> 	Using FromDevice works well though, so I'm looking into the
> polling side
> 	of things for now.
> 	
> 	Thanks
> 	
> 	Beyers
> 	
> 	
> 	
> 	
> 	On 3/22/06, Paine, Thomas Asa <PAINETA at uwec.edu> wrote:
> 	
> 	        Beyers,
> 	                I'm running production boxes on 2.6.13.2,
> patched, with
> 	no
> 	        problem (I have run over 500Kpps though them).  I can
> tell you
> 	I've seen
> 	        this kind of problem when I attempt to use a "so called"
> e1000 
> 	card.
> 	        Whenever I attempted to use a non-intel(OEM) branded
> Intel 1000
> 	that
> 	        kind of behavior is almost guaranteed at even moderate
> packet
> 	rates.  I
> 	        have had NO issues like that when running true Intel
> cards, 
> 	specifically
> 	        I have used 82543 and 82546 chip based cards.
> 	
> 	                One thing I have not done, however, is linked at
> less
> 	than 1Gb
> 	        with these cards, and I see you were connected at 100Mb.
> I'm
> 	not sure
> 	        if that could introduce any issues.  I would suspect not
> though. 
> 	
> 	
> 	        Thanks,
> 	        ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> 	           Thomas Paine (paineta at uwec.edu)
> 	           University of Wisconsin - Eau Claire
> 	           garbage foo(garbage g){return(g);}
> 	        ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> 	
> 	
> 	        -----Original Message-----
> 	        From: click-bounces at pdos.csail.mit.edu
> <mailto:click-bounces at pdos.csail.mit.edu> 
> 	        [mailto: click-bounces at pdos.csail.mit.edu
> 	<mailto:click-bounces at pdos.csail.mit.edu> ] On Behalf Of Beyers
> Cronje 
> 	        Sent: Tuesday, March 21, 2006 7:46 PM
> 	        To: Click
> 	        Subject: [Click] click on 2.6 kernel stability
> 	
> 	        Hi everyone,
> 	
> 	        Is anyone running a stable click kernel implementation
> on a 2.6
> 	kernel?
> 	        Using current cvs code with e1000-5.x polling driver I
> managed
> 	to
> 	        compile and run on 2.6.13.2 but the system is very
> unstable. I'm
> 	running
> 	        a basic config for testing: 
> 	
> 	        PollDevice(eth0) -> ToHost;
> 	        Idle -> ToDevice(eth0);
> 	
> 	        Input and output seems to hang every now and again with
> the odd
> 	complete
> 	        system hang. The only error messages I get are loads of
> the 
> 	following:
> 	
> 	        NETDEV WATCHDOG: eth0: transmit timed out
> 	        e1000: eth0: e1000_watchdog_1: NIC Link is Up 100 Mbps
> Full
> 	Duplex
> 	        NETDEV WATCHDOG: eth0: transmit timed out
> 	        e1000: eth0: e1000_watchdog_1: NIC Link is Up 100 Mbps
> Full 
> 	Duplex
> 	
> 	        This only occurs when click module is installed, when I
> unload
> 	click
> 	        module everything works fine. ethtool indicates the link
> is
> 	always up.
> 	        Watchdog never actually reports that the link ever went
> down, so 
> 	could
> 	        this indicate an irq conflict or race condition of some
> sort?
> 	Any ideas
> 	        on where to begin troubleshooting this?
> 	
> 	        Thanks
> 	
> 	        Beyers
> 	        _______________________________________________ 
> 	        click mailing list
> 	        click at amsterdam.lcs.mit.edu
> 	        https://amsterdam.lcs.mit.edu/mailman/listinfo/click
> <https://amsterdam.lcs.mit.edu/mailman/listinfo/click> 
> 	
> 	
> 	
> 	
> 
> 
> 
> _______________________________________________
> click mailing list
> click at amsterdam.lcs.mit.edu
> https://amsterdam.lcs.mit.edu/mailman/listinfo/click
> 




More information about the click mailing list