[Click] click userland patches for large speed improvements

Eddie Kohler kohler at cs.ucla.edu
Wed Jul 6 18:10:48 EDT 2011


For everyone's information, versions of Luigi's changes, including packet 
memory recycling, are now checked in.  THanks!
E


On 7/1/11 10:47 AM, Luigi Rizzo wrote:
> If someone is interest in performance of userland click, i'd suggest
> the following two patches and looking at netmap (i already discussed
> what follows with Eddie, and i am hoping someone more fluent than
> me in C++ can polish the code and add a support for thread-local lists).
>
> To get an idea of what you can get on a single core i7-870 CPU with
> the stock version and with these patches:
>
> 					1.8.0		With patches
>      InfiniteSource ->  Discard		515Kpps		18.56Mpps
>      InfiniteSource ->  Queue ->  Discard	500Kpps		13.41Mpps
>
> 					pcap		netmap
>      FromDevice->Queue->ToDevice		420Kpps		3.97 Mpps
>
>
> Click userland performance was never a priority given the high cost
> (until now) of packet I/O. But once packet i/o has become quite fast,
> it turns out that there are to other big offenders:
> - the C++ memory allocator is quite expensive, and replacing it with
>    thread-local freelists (Packet objects and data buffers can be made
>    all with the same size) gives huge savings -- 100ns per packet or more
>    even on a fast machine;
>
> - everytime an element wants a timestamp, it calls a syscall (gettimeofday()
>    or similar) which consumes another 400-800ns per call. There are many
>    elements (e.g. InfiniteSource, Counter, etc.) which timestamp packets.
>
> Attached there are a couple of patches which address these problems:
>
> - patch-pcap	makes FromDevice and ToDevice use libpcap properly,
> 		supporting I/O in bursts to amortize the syscall overhead.
> 		This has been tested on FreeBSD.
>
> - patch-more
>     + introduces a NOTS option for InfiniteSource to remove timestamps.
>       This gives a 10x performance improvement in simple apps using InfiniteSource
>
>     + replaces the allocator for Packet and data buffers with local freelists;
>       not thread safe, but this is easy to introduce. This gives another 1.5-2x
>       speed improvement after the 10x gained removing timestamps;
>
>     + enables BURST operation in Discard, giving another 2x speed improvement
>
> Using netmap instead of pcap is another big win, as you can see the forwarding
> performance of a simple FromDevice->Queue->ToDevice chain goes up by 10x
> You can find netmap at http://info.iet.unipi.it/~luigi/netmap/
>
> cheers
> luigi
>
>
>
> _______________________________________________
> click mailing list
> click at amsterdam.lcs.mit.edu
> https://amsterdam.lcs.mit.edu/mailman/listinfo/click


More information about the click mailing list