[Click] click userland patches for large speed improvements
Eddie Kohler
kohler at cs.ucla.edu
Wed Jul 6 18:10:48 EDT 2011
For everyone's information, versions of Luigi's changes, including packet
memory recycling, are now checked in. THanks!
E
On 7/1/11 10:47 AM, Luigi Rizzo wrote:
> If someone is interest in performance of userland click, i'd suggest
> the following two patches and looking at netmap (i already discussed
> what follows with Eddie, and i am hoping someone more fluent than
> me in C++ can polish the code and add a support for thread-local lists).
>
> To get an idea of what you can get on a single core i7-870 CPU with
> the stock version and with these patches:
>
> 1.8.0 With patches
> InfiniteSource -> Discard 515Kpps 18.56Mpps
> InfiniteSource -> Queue -> Discard 500Kpps 13.41Mpps
>
> pcap netmap
> FromDevice->Queue->ToDevice 420Kpps 3.97 Mpps
>
>
> Click userland performance was never a priority given the high cost
> (until now) of packet I/O. But once packet i/o has become quite fast,
> it turns out that there are to other big offenders:
> - the C++ memory allocator is quite expensive, and replacing it with
> thread-local freelists (Packet objects and data buffers can be made
> all with the same size) gives huge savings -- 100ns per packet or more
> even on a fast machine;
>
> - everytime an element wants a timestamp, it calls a syscall (gettimeofday()
> or similar) which consumes another 400-800ns per call. There are many
> elements (e.g. InfiniteSource, Counter, etc.) which timestamp packets.
>
> Attached there are a couple of patches which address these problems:
>
> - patch-pcap makes FromDevice and ToDevice use libpcap properly,
> supporting I/O in bursts to amortize the syscall overhead.
> This has been tested on FreeBSD.
>
> - patch-more
> + introduces a NOTS option for InfiniteSource to remove timestamps.
> This gives a 10x performance improvement in simple apps using InfiniteSource
>
> + replaces the allocator for Packet and data buffers with local freelists;
> not thread safe, but this is easy to introduce. This gives another 1.5-2x
> speed improvement after the 10x gained removing timestamps;
>
> + enables BURST operation in Discard, giving another 2x speed improvement
>
> Using netmap instead of pcap is another big win, as you can see the forwarding
> performance of a simple FromDevice->Queue->ToDevice chain goes up by 10x
> You can find netmap at http://info.iet.unipi.it/~luigi/netmap/
>
> cheers
> luigi
>
>
>
> _______________________________________________
> click mailing list
> click at amsterdam.lcs.mit.edu
> https://amsterdam.lcs.mit.edu/mailman/listinfo/click
More information about the click
mailing list