The reason to be interested in small-packet performance is if you wish to build routers, or router-like boxes such as NATs. In such situations the average packet size is likely to be about 200 bytes. Many gigabit ethernet board designs (and marketing) seem to focus more on 1500 or 9000 byte packets.
The test machines are PCs with SuperMicro 370DL3 motherboards, 800 mHz Pentium III CPUs, 133 mHz front-side bus, and 256 MB of PC133 memory. This motherboard has the ServerWorks ServerSet LE chipset and 64-bit PCI slots. The machines have two CPUs but are running a Linux kernel with SMP support turned off.
The machines are running Linux 2.2.16. The networking code, however, is the Click software router toolkit. Depending on the precise configuration, Click replaces some or all of the Linux kernel networking code. The point of using Click is that it can send and receive packets much faster than any user-level program, because it runs in the kernel. The send software for these experiments sends UDP packets at a controlled rate. The receive software just counts and discards packets. The packets are a total of 60 bytes in length, including the 14 byte ethernet header.
The Pro/1000 driver is based on Intel's version 2.5.11, available on the web here.
The two machines involved are directly connected with a fiber cable. Link-level flow control is disabled for all the tests.
After fixing the driver to specifically ask the board for a transmit complete interrupt every 60 packets, the PWLA8490 is able to send 523,000 p/s. The PWLA8490SX can send about 840,000 p/s. The detailed fix was to not turn on the E1000_TXD_CMD_IDE bit in every 60th transmit descriptor.
The Original line corresponds to the unmodified Intel driver. It can receive about 300,000 p/s. At higher input rates it seems to experience interrupt livelock -- the card interrupts for every received packet, and the cost of the interrupt handling prevents the CPU from performing any other processing.
The driver source includes code to ask the card to delay interrupts, but that code isn't turned on. The relevant variable is e1000_rxint_delay. It appears to be the number of microseconds between interrupts. The receive DMA queue length is set by MAX_RFD, so the maximum receive rate should be about MAX_RFD packets per delay period. Unfortunately these parameters probably have to be tuned for each specific workload. The delay period should be long enough that the CPU can completely process MAX_RFD packets per delay, including user-level processing if appropriate. If the delay is too low, the CPU will experience livelock and get no work done. If the delay is too high, the card will discard packets even though the CPU is idle.
I found that leaving MAX_RFD at 80 packets and setting the receive interrupt delay to 128 (the same as the transmit delay) worked well. This allows about 1.6 microseconds of processing time per packet, which is enough for my receive software to count and discard a packet. The resulting behavior is shown by the Tuned line in the graph above. Note that the receive rate goes up to 450,000 p/s, but then descends. I wasn't able to find MAX_RFD and delay values that prevented the decline. This is too bad -- part of the point of delayed interrupts is to prevent livelock, but it doesn't seem to work.
The Polling line in the graph describes a setup in which the card doesn't interrupt at all. Instead, the Click software polls the card for new packets, fully processes them, and only then polls for more packets. This prevents livelock as well as avoiding interrupt overhead, so the driver can receive (and process) 680,000 p/s even when overloaded with input.
It's too bad the Intel Linux driver can only achieve about half the board's potential. It's also disappointing that the delayed interrupt mechanism seems to require manual tuning, and that it doesn't prevent livelock.
You can find my modified version of the Intel 2.5.11 driver here. My modifications support Click's polling and simplify the code to help get rid of some locking and increase concurrency. I could easily have introduced bugs, so don't use my driver if you can't tolerate problems.
You can find a more up-to-date Pro/1000 driver as part of the Click distribution.
Note that since I don't have a manual for the Intel board, I may be misunderstanding its behavior. And it could easily be the case that the Intel, Alteon, and SysKonnect hardware could perform better than I've suggested here with better drivers or with a different test strategy. So take my results and explanations with a grain of salt.
Robert Morris, firstname.lastname@example.org, November 2000.