[Click] Userlevel performance issues

Robert Ross rross at dsci.com
Wed Jan 30 15:24:34 EST 2008


I'm not sure what this means, but we have been able to completely avoid
this problem by using kernel-level Click with the experimental
FromUserDevice, and a user-level click reading FromDump and pushing
packets out on a custom ToRawFile element.

I will gladly put together and test a simple configuration.  It would be
identical to the configuration I had attached except for switching the
Socket() to a FromDump().  I will run some more tests and send you the
monitor.csv output from our script elements.  

BTW, we used the monitor.csv output file in tandem with the Java-based
LiveGraph to see real-time statistics on Click performance.  You can
also use Livegraph after the fact to open up and view our Monitor.csv
file on your end once I send you output.  It has been a very nice
marriage of capabilities for real-time analysis with minimal coding.
We've done something similar in kernel-level, but had to write a custom
java application to output the monitor.csv since kernel configurations
cannot output directly to files.


Robert Ross
DSCI Inc.
Office: 732.542.3113 x173
Home: 609.702.8114
Cell: 609.509.5139
Fax: 253.550.6198

-----Original Message-----
From: Eddie Kohler [mailto:kohler at cs.ucla.edu] 
Sent: Tuesday, January 29, 2008 2:39 PM
To: Robert Ross
Cc: Beyers Cronje; click at amsterdam.lcs.mit.edu
Subject: Re: [Click] Userlevel performance issues

Hi Robert,

The *job* of LinkUnqueue is specifically to throttle performance.  It is
designed to output packets at the bandwidth specified.  This will cause
a lower rate, pinned to that bandwidth!

The numbers you report are kind of reasonable.  Click parses bandwidths
as powers of 10, which is the networking standard as far as I can tell.
So 512Kbps = 512000bps = 64000Bps; 190p/s at this rate implies 336B
packets.  So 1360p/s, for your highest bandwidth LinkUnqueue, assuming
the same packet length, is roughly half what it "should" be.  That's not
great, but it's not terrible.

I have not run your configuration with Sockets, but I have with
InfiniteSources, and so forth, and have observed LinkUnqueue outputing
packets at the correct rate.  In fact I checked in an update to Counter,
to give it bit_rate and byte_rate handlers, making this easier to see.

LinkUnqueue should affect the upstream Socket elements only indirectly. 
LinkUnqueue stops pulling from its input when the emulated link is full.
This will cause an upstream Queue to fill up.  Some elements might
notice that Queue's full state and stop producing packets (since those
packets will only be dropped).  The InfiniteSource and user-level
FromHost elements have this behavior.  However, your use of
NotifierQueue (instead of Queue) would neutralize this effect, since
NotifierQueue doesn't provide full notification.

I am unsure in the end whether you are observing a bug or correct
behavior. 
Here are a couple questions to help us figure it out.

- Re: FromDump and ToDevice.  Can you reduce the configuration as much
as possible, and tell us what rates ToDevice achieves without FromDump,
and what it achieves with FromDump?  Your mail isn't specific about the
configuration or the performance numbers.

- Re: LinkUnqueue.  Can you send the output of your configuration (cool
use of define and Script btw), as well as the configuration?  Again,
with InfiniteSource I see expected behavior, and I would not expect
LinkUnqueue to throttle Socket.

It may be that you are finding an unfortunate interaction between
Click's task handlers and its file descriptor handlers -- something we
could potentially fix.  But without specific numbers it's hard to tell.

Eddie


Robert Ross wrote:
> The only clear item that seems to have a marked difference is the 
> LinkUnqueue element.  The fact that our ToDevice and FromDevice/Socket

> performance appears to be related somehow to the configuration of a 
> LinkUnqueue element sitting in the middle of our configuration is too 
> obvious to ignore.  Does LinkUnqueue perform some kind of 
> upstream/downstream notification to these elements, causing them to 
> throttle their behavior based on LinkUnqueue?
>  
> In our tests, with all other elements remaining the same, here is what

> we found from two independent read handler counts:
>  
> LinkUnqueue("512Kbps") = Maximum ~190 packets/second pushed from the 
> Socket element and pulled by the ToDevice element
> LinkUnqueue("1Mbps") = Maxmum ~290 packets/second pushed from the 
> Socket element and pulled by the ToDevice element
> LinkUnqueue("2Mbps") = Maximum ~490 packets/second pushed from the 
> Socket element and pulled by the ToDevice element
> LinkUnqueue("4Mbps") = Maximum ~780 packets/second pushed from the 
> Socket element and pulled by the ToDevice element
> LinkUnqueue("6Mbps") = Maximum ~980 packets/second pushed from the 
> Socket element and pulled by the ToDevice element
> LinkUnqueue("8Mbps") = Maximum ~1360 packets/second pushed from the 
> Socket element and pulled by the ToDevice element
>  
> It is also telling that independant handler counters corroborate 
> exactly the same maximum packets per second in two very different 
> places in the configuration.  Clearly you can see that the limitation 
> on processing is completely artificial and not an actual performance 
> problem, since increasing LinkUnqueue increases the performance in a 
> very controlled and obvious manner.
>  
> I have attached a simple configuration that examines specific handlers

> and outputs values each second to a CSV file for analysis.  The 
> configuration is scaled back to complete simplicity, yet has the same 
> performance as our actual configuration which has a much more 
> complicated configuration.  Nevertheless, the performance is identical

> and seems to point squarely at LinkUnqueue.
>  
> What is LinkUnqueue doing that could be causing this type of effect on

> FromHost, Socket and ToDevice?
> 
> 
> ________________________________
> 
> From: Robert Ross
> Sent: Friday, January 25, 2008 7:40 PM
> To: 'Beyers Cronje'
> Cc: click at pdos.csail.mit.edu
> Subject: RE: [Click] Userlevel performance issues
> 
> 
> Sorry, I wasn't clear that the queues are necessary for our 
> configuration.  The configuration is somewhat complex.  I was only 
> attempting to highlight the important parts.
>  
>  
> 
> 
> ________________________________
> 
> From: Beyers Cronje [mailto:bcronje at gmail.com]
> Sent: Friday, January 25, 2008 7:31 PM
> To: Robert Ross
> Cc: click at pdos.csail.mit.edu
> Subject: Re: [Click] Userlevel performance issues
> 
> 
> Hi Robert,
> 
> 
>  
> 
> 	*       We first found that when UserLevel Click started pulling
> from a
> 	PCAP file, the performance of the ToDevice() appeared to drop 
> sharply.
> 	What I mean by this is that the ToDevice() pull handler reported

> values
> 	in the range of 200 packets/second once the PCAP file started 
> reading.
> 	This resulted in the outbound queue just prior to the ToDevice()

> filling
> 	up and eventually overflowing because the packet rate in the
PCAP 
> file
> 	is far more than 200 packets/second.
> 
> 
> You dont have to use a queue between FromDump and ToDevice as FromDump

> is an agnostic element. In other words you can connect Todevice 
> directly to FromDump which should ensure that at least no packets are 
> dropped and you should see best ToDevice performance.
> 
> Also there are a few tuning parameters. Try tuning your NIC TX Ring 
> size. On the e1000 driver the default TX ring size is 256, experiment 
> with different value to see if it makes a difference.ToDevice uses a 
> packet socket from transmit, so it might be worth experimenting with 
> /proc/sys/net/core/wmem_default /proc/sys/net/core/wmem_max
> 
> 
> Beyers
> 
> 
> 
> ----------------------------------------------------------------------
> --
> 
> _______________________________________________
> click mailing list
> click at amsterdam.lcs.mit.edu
> https://amsterdam.lcs.mit.edu/mailman/listinfo/click



More information about the click mailing list