[Click] click userland patches for large speed improvements

Roman Chertov rchertov at cs.ucsb.edu
Fri Jul 1 14:01:05 EDT 2011


For curiosity sake, I just ran the script shown below on my machine and got 
3.34 Mpps.  I am using the latest Click from git (as of two days ago), and
running the following CPUs: Intel(R) Xeon(R) CPU  W3520  @ 2.67GH


It seems that 0.5Mpps is pretty low for an i7-870 CPU, but it does appear that
the patches improved the performance significantly.


InfiniteSource -> ctr::AverageCounter -> Queue -> Discard;


Script(
    wait 60,
    print ctr.count,
    print ctr.byte_count,
);

Roman

On Fri, 1 Jul 2011 19:47:13 +0200 Luigi Rizzo <rizzo at iet.unipi.it> wrote

> If someone is interest in performance of userland click, i'd suggest
> the following two patches and looking at netmap (i already discussed
> what follows with Eddie, and i am hoping someone more fluent than
> me in C++ can polish the code and add a support for thread-local lists).
> 
> To get an idea of what you can get on a single core i7-870 CPU with
> the stock version and with these patches:
> 
> 					1.8.0		With patches
>     InfiniteSource -> Discard		515Kpps		18.56Mpps
>     InfiniteSource -> Queue -> Discard	500Kpps		13.41Mpps
> 
> 					pcap		netmap
>     FromDevice->Queue->ToDevice		420Kpps		3.97 Mpps
> 
> 
> Click userland performance was never a priority given the high cost
> (until now) of packet I/O. But once packet i/o has become quite fast,
> it turns out that there are to other big offenders:
> - the C++ memory allocator is quite expensive, and replacing it with
>   thread-local freelists (Packet objects and data buffers can be made
>   all with the same size) gives huge savings -- 100ns per packet or more
>   even on a fast machine;
> 
> - everytime an element wants a timestamp, it calls a syscall (gettimeofday()
>   or similar) which consumes another 400-800ns per call. There are many
>   elements (e.g. InfiniteSource, Counter, etc.) which timestamp packets.
> 
> Attached there are a couple of patches which address these problems:
> 
> - patch-pcap	makes FromDevice and ToDevice use libpcap properly,
> 		supporting I/O in bursts to amortize the syscall overhead.
> 		This has been tested on FreeBSD.
> 
> - patch-more
>    + introduces a NOTS option for InfiniteSource to remove timestamps.
>      This gives a 10x performance improvement in simple apps using
> InfiniteSource
> 
>    + replaces the allocator for Packet and data buffers with local freelists;
>      not thread safe, but this is easy to introduce. This gives another
> 1.5-2x
>      speed improvement after the 10x gained removing timestamps;
> 
>    + enables BURST operation in Discard, giving another 2x speed improvement
> 
> Using netmap instead of pcap is another big win, as you can see the
> forwarding
> performance of a simple FromDevice->Queue->ToDevice chain goes up by 10x
> You can find netmap at http://info.iet.unipi.it/~luigi/netmap/
> 
> cheers
> luigi




More information about the click mailing list