[Click] Userlevel performance issues

Eddie Kohler kohler at cs.ucla.edu
Tue Jan 29 14:38:54 EST 2008


Hi Robert,

The *job* of LinkUnqueue is specifically to throttle performance.  It is 
designed to output packets at the bandwidth specified.  This will cause a 
lower rate, pinned to that bandwidth!

The numbers you report are kind of reasonable.  Click parses bandwidths as 
powers of 10, which is the networking standard as far as I can tell.  So 
512Kbps = 512000bps = 64000Bps; 190p/s at this rate implies 336B packets.  So 
1360p/s, for your highest bandwidth LinkUnqueue, assuming the same packet 
length, is roughly half what it "should" be.  That's not great, but it's not 
terrible.

I have not run your configuration with Sockets, but I have with 
InfiniteSources, and so forth, and have observed LinkUnqueue outputing packets 
at the correct rate.  In fact I checked in an update to Counter, to give it 
bit_rate and byte_rate handlers, making this easier to see.

LinkUnqueue should affect the upstream Socket elements only indirectly. 
LinkUnqueue stops pulling from its input when the emulated link is full.  This 
will cause an upstream Queue to fill up.  Some elements might notice that 
Queue's full state and stop producing packets (since those packets will only 
be dropped).  The InfiniteSource and user-level FromHost elements have this 
behavior.  However, your use of NotifierQueue (instead of Queue) would 
neutralize this effect, since NotifierQueue doesn't provide full notification.

I am unsure in the end whether you are observing a bug or correct behavior. 
Here are a couple questions to help us figure it out.

- Re: FromDump and ToDevice.  Can you reduce the configuration as much as 
possible, and tell us what rates ToDevice achieves without FromDump, and what 
it achieves with FromDump?  Your mail isn't specific about the configuration 
or the performance numbers.

- Re: LinkUnqueue.  Can you send the output of your configuration (cool use of 
define and Script btw), as well as the configuration?  Again, with 
InfiniteSource I see expected behavior, and I would not expect LinkUnqueue to 
throttle Socket.

It may be that you are finding an unfortunate interaction between Click's task 
handlers and its file descriptor handlers -- something we could potentially 
fix.  But without specific numbers it's hard to tell.

Eddie


Robert Ross wrote:
> The only clear item that seems to have a marked difference is the
> LinkUnqueue element.  The fact that our ToDevice and FromDevice/Socket
> performance appears to be related somehow to the configuration of a
> LinkUnqueue element sitting in the middle of our configuration is too
> obvious to ignore.  Does LinkUnqueue perform some kind of
> upstream/downstream notification to these elements, causing them to
> throttle their behavior based on LinkUnqueue?
>  
> In our tests, with all other elements remaining the same, here is what
> we found from two independent read handler counts:
>  
> LinkUnqueue("512Kbps") = Maximum ~190 packets/second pushed from the
> Socket element and pulled by the ToDevice element
> LinkUnqueue("1Mbps") = Maxmum ~290 packets/second pushed from the Socket
> element and pulled by the ToDevice element
> LinkUnqueue("2Mbps") = Maximum ~490 packets/second pushed from the
> Socket element and pulled by the ToDevice element
> LinkUnqueue("4Mbps") = Maximum ~780 packets/second pushed from the
> Socket element and pulled by the ToDevice element
> LinkUnqueue("6Mbps") = Maximum ~980 packets/second pushed from the
> Socket element and pulled by the ToDevice element
> LinkUnqueue("8Mbps") = Maximum ~1360 packets/second pushed from the
> Socket element and pulled by the ToDevice element
>  
> It is also telling that independant handler counters corroborate exactly
> the same maximum packets per second in two very different places in the
> configuration.  Clearly you can see that the limitation on processing is
> completely artificial and not an actual performance problem, since
> increasing LinkUnqueue increases the performance in a very controlled
> and obvious manner.
>  
> I have attached a simple configuration that examines specific handlers
> and outputs values each second to a CSV file for analysis.  The
> configuration is scaled back to complete simplicity, yet has the same
> performance as our actual configuration which has a much more
> complicated configuration.  Nevertheless, the performance is identical
> and seems to point squarely at LinkUnqueue.
>  
> What is LinkUnqueue doing that could be causing this type of effect on
> FromHost, Socket and ToDevice?  
> 
> 
> ________________________________
> 
> From: Robert Ross 
> Sent: Friday, January 25, 2008 7:40 PM
> To: 'Beyers Cronje'
> Cc: click at pdos.csail.mit.edu
> Subject: RE: [Click] Userlevel performance issues
> 
> 
> Sorry, I wasn't clear that the queues are necessary for our
> configuration.  The configuration is somewhat complex.  I was only
> attempting to highlight the important parts.
>  
>  
> 
> 
> ________________________________
> 
> From: Beyers Cronje [mailto:bcronje at gmail.com] 
> Sent: Friday, January 25, 2008 7:31 PM
> To: Robert Ross
> Cc: click at pdos.csail.mit.edu
> Subject: Re: [Click] Userlevel performance issues
> 
> 
> Hi Robert,
> 
> 
>  
> 
> 	*       We first found that when UserLevel Click started pulling
> from a
> 	PCAP file, the performance of the ToDevice() appeared to drop
> sharply.
> 	What I mean by this is that the ToDevice() pull handler reported
> values
> 	in the range of 200 packets/second once the PCAP file started
> reading.
> 	This resulted in the outbound queue just prior to the ToDevice()
> filling
> 	up and eventually overflowing because the packet rate in the
> PCAP file
> 	is far more than 200 packets/second.
> 
> 
> You dont have to use a queue between FromDump and ToDevice as FromDump
> is an agnostic element. In other words you can connect Todevice directly
> to FromDump which should ensure that at least no packets are dropped and
> you should see best ToDevice performance.
> 
> Also there are a few tuning parameters. Try tuning your NIC TX Ring
> size. On the e1000 driver the default TX ring size is 256, experiment
> with different value to see if it makes a difference.ToDevice uses a
> packet socket from transmit, so it might be worth experimenting with
> /proc/sys/net/core/wmem_default
> /proc/sys/net/core/wmem_max
> 
> 
> Beyers
> 
> 
> 
> ------------------------------------------------------------------------
> 
> _______________________________________________
> click mailing list
> click at amsterdam.lcs.mit.edu
> https://amsterdam.lcs.mit.edu/mailman/listinfo/click


More information about the click mailing list