[Click] Userlevel performance issues

Eddie Kohler kohler at cs.ucla.edu
Thu Feb 7 10:35:26 EST 2008


It would have affected kernel mode as well.  But the problem was not strictly 
a PERFORMANCE problem, Click still performed well for them, it just took 100% 
cpu.  Your FromDump -> ... might have been affected because FromDump can use 
timers in some configs, and thus would take lower priority than busy waiting.

Eddie


Robert Ross wrote:
> Do you know if this problem would affect kernel-mode performance as 
> well, or was this isolated to userlevel only?
> 
> On Tue, 2008-02-05 at 17:41, Eddie Kohler wrote:
>> /Hi Robert,
>>
>> I wonder if your observed weirdness with LinkUnqueue was due to the 
>> 100%-CPU-on-DelayUnqueue problem recently reported.  Maybe if you tried the 
>> configuration now?
>>
>> Eddie
>>
>>
>> Robert Ross wrote:
>> > I'm not sure what this means, but we have been able to completely avoid
>> > this problem by using kernel-level Click with the experimental
>> > FromUserDevice, and a user-level click reading FromDump and pushing
>> > packets out on a custom ToRawFile element.
>> > 
>> > I will gladly put together and test a simple configuration.  It would be
>> > identical to the configuration I had attached except for switching the
>> > Socket() to a FromDump().  I will run some more tests and send you the
>> > monitor.csv output from our script elements.  
>> > 
>> > BTW, we used the monitor.csv output file in tandem with the Java-based
>> > LiveGraph to see real-time statistics on Click performance.  You can
>> > also use Livegraph after the fact to open up and view our Monitor.csv
>> > file on your end once I send you output.  It has been a very nice
>> > marriage of capabilities for real-time analysis with minimal coding.
>> > We've done something similar in kernel-level, but had to write a custom
>> > java application to output the monitor.csv since kernel configurations
>> > cannot output directly to files.
>> > 
>> > 
>> > Robert Ross
>> > DSCI Inc.
>> > Office: 732.542.3113 x173
>> > Home: 609.702.8114
>> > Cell: 609.509.5139
>> > Fax: 253.550.6198
>> > 
>> > -----Original Message-----
>> > From: Eddie Kohler [mailto:kohler at cs.ucla.edu] 
>> > Sent: Tuesday, January 29, 2008 2:39 PM
>> > To: Robert Ross
>> > Cc: Beyers Cronje; click at amsterdam.lcs.mit.edu
>> > Subject: Re: [Click] Userlevel performance issues
>> > 
>> > Hi Robert,
>> > 
>> > The *job* of LinkUnqueue is specifically to throttle performance.  It is
>> > designed to output packets at the bandwidth specified.  This will cause
>> > a lower rate, pinned to that bandwidth!
>> > 
>> > The numbers you report are kind of reasonable.  Click parses bandwidths
>> > as powers of 10, which is the networking standard as far as I can tell.
>> > So 512Kbps = 512000bps = 64000Bps; 190p/s at this rate implies 336B
>> > packets.  So 1360p/s, for your highest bandwidth LinkUnqueue, assuming
>> > the same packet length, is roughly half what it "should" be.  That's not
>> > great, but it's not terrible.
>> > 
>> > I have not run your configuration with Sockets, but I have with
>> > InfiniteSources, and so forth, and have observed LinkUnqueue outputing
>> > packets at the correct rate.  In fact I checked in an update to Counter,
>> > to give it bit_rate and byte_rate handlers, making this easier to see.
>> > 
>> > LinkUnqueue should affect the upstream Socket elements only indirectly. 
>> > LinkUnqueue stops pulling from its input when the emulated link is full.
>> > This will cause an upstream Queue to fill up.  Some elements might
>> > notice that Queue's full state and stop producing packets (since those
>> > packets will only be dropped).  The InfiniteSource and user-level
>> > FromHost elements have this behavior.  However, your use of
>> > NotifierQueue (instead of Queue) would neutralize this effect, since
>> > NotifierQueue doesn't provide full notification.
>> > 
>> > I am unsure in the end whether you are observing a bug or correct
>> > behavior. 
>> > Here are a couple questions to help us figure it out.
>> > 
>> > - Re: FromDump and ToDevice.  Can you reduce the configuration as much
>> > as possible, and tell us what rates ToDevice achieves without FromDump,
>> > and what it achieves with FromDump?  Your mail isn't specific about the
>> > configuration or the performance numbers.
>> > 
>> > - Re: LinkUnqueue.  Can you send the output of your configuration (cool
>> > use of define and Script btw), as well as the configuration?  Again,
>> > with InfiniteSource I see expected behavior, and I would not expect
>> > LinkUnqueue to throttle Socket.
>> > 
>> > It may be that you are finding an unfortunate interaction between
>> > Click's task handlers and its file descriptor handlers -- something we
>> > could potentially fix.  But without specific numbers it's hard to tell.
>> > 
>> > Eddie
>> > 
>> > 
>> > Robert Ross wrote:
>> >> The only clear item that seems to have a marked difference is the 
>> >> LinkUnqueue element.  The fact that our ToDevice and FromDevice/Socket
>> > 
>> >> performance appears to be related somehow to the configuration of a 
>> >> LinkUnqueue element sitting in the middle of our configuration is too 
>> >> obvious to ignore.  Does LinkUnqueue perform some kind of 
>> >> upstream/downstream notification to these elements, causing them to 
>> >> throttle their behavior based on LinkUnqueue?
>> >>  
>> >> In our tests, with all other elements remaining the same, here is what
>> > 
>> >> we found from two independent read handler counts:
>> >>  
>> >> LinkUnqueue("512Kbps") = Maximum ~190 packets/second pushed from the 
>> >> Socket element and pulled by the ToDevice element
>> >> LinkUnqueue("1Mbps") = Maxmum ~290 packets/second pushed from the 
>> >> Socket element and pulled by the ToDevice element
>> >> LinkUnqueue("2Mbps") = Maximum ~490 packets/second pushed from the 
>> >> Socket element and pulled by the ToDevice element
>> >> LinkUnqueue("4Mbps") = Maximum ~780 packets/second pushed from the 
>> >> Socket element and pulled by the ToDevice element
>> >> LinkUnqueue("6Mbps") = Maximum ~980 packets/second pushed from the 
>> >> Socket element and pulled by the ToDevice element
>> >> LinkUnqueue("8Mbps") = Maximum ~1360 packets/second pushed from the 
>> >> Socket element and pulled by the ToDevice element
>> >>  
>> >> It is also telling that independant handler counters corroborate 
>> >> exactly the same maximum packets per second in two very different 
>> >> places in the configuration.  Clearly you can see that the limitation 
>> >> on processing is completely artificial and not an actual performance 
>> >> problem, since increasing LinkUnqueue increases the performance in a 
>> >> very controlled and obvious manner.
>> >>  
>> >> I have attached a simple configuration that examines specific handlers
>> > 
>> >> and outputs values each second to a CSV file for analysis.  The 
>> >> configuration is scaled back to complete simplicity, yet has the same 
>> >> performance as our actual configuration which has a much more 
>> >> complicated configuration.  Nevertheless, the performance is identical
>> > 
>> >> and seems to point squarely at LinkUnqueue.
>> >>  
>> >> What is LinkUnqueue doing that could be causing this type of effect on
>> > 
>> >> FromHost, Socket and ToDevice?
>> >>
>> >>
>> >> ________________________________
>> >>
>> >> From: Robert Ross
>> >> Sent: Friday, January 25, 2008 7:40 PM
>> >> To: 'Beyers Cronje'
>> >> Cc: click at pdos.csail.mit.edu
>> >> Subject: RE: [Click] Userlevel performance issues
>> >>
>> >>
>> >> Sorry, I wasn't clear that the queues are necessary for our 
>> >> configuration.  The configuration is somewhat complex.  I was only 
>> >> attempting to highlight the important parts.
>> >>  
>> >>  
>> >>
>> >>
>> >> ________________________________
>> >>
>> >> From: Beyers Cronje [mailto:bcronje at gmail.com]
>> >> Sent: Friday, January 25, 2008 7:31 PM
>> >> To: Robert Ross
>> >> Cc: click at pdos.csail.mit.edu
>> >> Subject: Re: [Click] Userlevel performance issues
>> >>
>> >>
>> >> Hi Robert,
>> >>
>> >>
>> >>  
>> >>
>> >> 	*       We first found that when UserLevel Click started pulling
>> >> from a
>> >> 	PCAP file, the performance of the ToDevice() appeared to drop 
>> >> sharply.
>> >> 	What I mean by this is that the ToDevice() pull handler reported
>> > 
>> >> values
>> >> 	in the range of 200 packets/second once the PCAP file started 
>> >> reading.
>> >> 	This resulted in the outbound queue just prior to the ToDevice()
>> > 
>> >> filling
>> >> 	up and eventually overflowing because the packet rate in the
>> > PCAP 
>> >> file
>> >> 	is far more than 200 packets/second.
>> >>
>> >>
>> >> You dont have to use a queue between FromDump and ToDevice as FromDump
>> > 
>> >> is an agnostic element. In other words you can connect Todevice 
>> >> directly to FromDump which should ensure that at least no packets are 
>> >> dropped and you should see best ToDevice performance.
>> >>
>> >> Also there are a few tuning parameters. Try tuning your NIC TX Ring 
>> >> size. On the e1000 driver the default TX ring size is 256, experiment 
>> >> with different value to see if it makes a difference.ToDevice uses a 
>> >> packet socket from transmit, so it might be worth experimenting with 
>> >> /proc/sys/net/core/wmem_default /proc/sys/net/core/wmem_max
>> >>
>> >>
>> >> Beyers
>> >>
>> >>
>> >>
>> >> ----------------------------------------------------------------------
>> >> --
>> >>
>> >> _______________________________________________
>> >> click mailing list
>> >> click at amsterdam.lcs.mit.edu
>> >> //_https://amsterdam.lcs.mit.edu/mailman/listinfo/click_/


More information about the click mailing list