Frequently Asked Questions for Eliminating Receive Livelock in an
Interrupt-Driven Kernel, by Mogul and Ramakrishnan, 1996.

Q: What is the livelock problem the paper is concerned with?

A: Livelock occurs when packets arrive so fast that the CPU spends
most or all of its time in the NIC interrupt handler, moving packets
from the NIC's receive DMA ring to the software IP input queue. That
leaves little or no CPU time to execute the thread that performs IP
processing, and thus packets aren't moved from the IP input queue to
the output NIC. The result is that the IP input queue grows to the
maximum allowed length, and from then on the NIC interrupt handler
discards most or all arriving packets.

The paper's suggested solution is to disable interrupts under high
load, and have a polling thread that directly reads packets from the
NIC (no IP input queue), does all the IP processing, and hands the
packets to the output NIC. This solves the livelock problem because
there are no interrupts when the load is high, and thus no danger of
spending too much time in the NIC interrupt handler.

A NIC's first interrupt wakes up the polling thread, and after that
interrupts are disabled for the device. The polling thread moves a
batch of packets from the input device, through IP processing, and to
the output devices, including all output device handling. After fully
processing all packets in the NIC RX DMA ring, the polling thread
re-enables interrupts and goes back to sleep.

Q: What is link level processing?

A: The low-level mechanisms required to send and receive packets. This
usually includes the encoding of bits into electrical signals and
marking the boundaries between packets.

https://en.wikipedia.org/wiki/Data_link_layer

Q: Section 6.4 notes that one can't use the kernel's thread scheduler
to simulate the round robin scheduling for processing packets. Why not?

A: One thing that probably wouldn't work well would be to have one
polling thread per network interface, or one thread for sending and
one thread for receiving, and rely on the ordinary thread scheduler to
decide when to run each thread. Under high load the scheduler would
only switch threads due to clock interrupts. But that might not be
often enough -- packets arrive so fast that the input queues might
overflow in the intervals between clock interrupts.

Q: Does polling result in low latency for packet processing?

A: The answer depends on how much CPU time you're willing to devote to
polling. If the computer has nothing else to do, and thus can spend
essentially all its time in the polling loop checking for recently
arrived packets, then polling can deliver very low latency, lower than
interrupts. But if the computer needs to spend time doing things other
than the polling loop, and thus cannot check for newly arrived packets
very often, interrupts can deliver lower latency.

Q: When the network polling thread is active, will the kernel
scheduler context-switch between it and user processes (like screend)?

A: I suspect that the authors' operating system would schedule among
kernel threads that want to run, but that kernel threads had absolute
priority over user-level execution. And the authors were using
uni-processors (only one core). So during times when the polling
thread wants to run, no-user level code will run. This is one
ingredient in the Section 6.6.1 / Figure 6-4 results: the "feedback"
mechanism causes the polling thread to sleep, allowing the user-level
screend program to run.

Q: Will the paper's polling scheme result in discarded packets if the
input packet rate is high?

A: Yes. With the paper's polling scheme, if packets arrive too fast
for the computer to process them, then at first the NIC hardware will
buffer the excess packets in an internal queue, and when the NIC runs
out of buffer space, it will start discarding incoming packets.

Any design that can't keep up with the maximum possible input rate will
have to discard packets under high load. The paper's key insight is that
it's best to discard the excess packets at the earliest possible moment,
to avoid wasting CPU time to partially process packets that will later
be discarded.

Q: What is screend?

A: screend implements a network firewall on a router, to defend a
local network against unwanted traffic from the larger Internet.

The kernel network code shows each packet to screend, which looks at the
packet and tells the kernel whether it's OK to forward the packet.
screend is a user-level program (not part of the kernel), and the kernel
maintains a queue of packets waiting for screend to look at.

You can learn more here:

  https://citeseerx.ist.psu.edu/document?repid=rep1&type=pdf&doi=a822dc4058040a5a3866af897d9af9f618ba83ed
  http://bitsavers.informatik.uni-stuttgart.de/pdf/dec/tech_reports/NSL-TN-2.pdf

Q: Computers are much faster now than in 1996. Has that made livelock
problems go away?

A: Faster CPUs only help if the CPU is so fast that it can handle the
peak load that the network can deliver. But networks have gotten
faster about as fast as CPUs have gotten faster, so we're still in a
situation where it's pretty easy for network traffic to overwhelm a
computer.

On the other hand, this paper caused O/S designers to take steps to
avoid livelock. For example, Linux's NAPI is an adaptive polling
scheme inspired by the livelock paper:

  https://www.usenix.org/legacy/publications/library/proceedings/als01/full_papers/jamal/jamal.pdf
  https://lwn.net/Articles/833840/

Some hardware improvements have helped as well. Modern NICs implement
"interrupt moderation" which can limit the rate at which they generate
interrupts (regardless of packet arrival rate). And modern CPUs have
multiple cores, which makes polling easier (one core can poll while
others do other work).

Q: What are flow-controlled and non-flow-controlled protocols?

A: A flow-controlled protocol, such as TCP, allows a sender to send
new data only when the receiver indicates it is ready. Livelock is
less likely with flow-controlled protocols: if the receiver can't keep
up due to high packet arrival rate, it will automatically stop asking
the sender for new data, and the sender will stop sending.

Non-flow-controlled protocols don't have this feedback from receiver
to sender: the sender keeps sending at whatever rate it chooses
regardless of what happens at the receiver. Some real-time audio and
video streaming protocols work this way. So even if the receiver is
experiencing overload and livelock, the sender won't slow down.

Q: The author's proposed design seems like it will discard packets if
the input rate is too high. Isn't that bad for the applications
sending network data?

A: Yes, if a router drops more than a small fraction of packets,
higher-level protocols and applications will likely behave badly. A
reliable protocol like TCP will detect the lost packets after a timeout
and re-send them; the receiver will experience these timeout delays; and
the re-sends increase the load on the router even more. Protocols that
don't re-transmit will experience other problems if there is significant
packet loss (e.g. audio drop-outs).

Network operators try to ensure that the components of the network
(routers, links, &c) are fast enough to handle ordinary high loads, in
order to avoid the need to discard packets.

However, there are always surprises: for example, some new application
might suddenly become popular (the Web, long ago; streaming audio;
streaming video) that places unexpected high loads on the network. In
the long run, the network operator probably has to buy
higher-performance equipment. In the short run, it matters how the
network handles sudden overload. Overload inevitably results in
discarded packets, even in a system without livelock or other similar
problems. But it's best to only discard to the extent that's needed to
reduce the packet rate to what the network and routers can handle. The
livelock problem that the paper addresses is that *more* was being
discarded than necessary.

Q: What kind of NIC is used in the paper's measurements?

A: The NIC is the "AMD LANCE". The LANCE is similar to the e1000 from
the net lab, with RX and TX DMA descriptor rings. By default, like the
e1000, it interrupts whenever packets arrive, and whenever the NIC has
finished sending an outgoing packet.

Q: The paper says that it's best to discard packets as early as
possible. How does the paper's design decide which packets to drop?

A: The paper's design, in Section 6.4, does not explicitly decide when
to drop packets.

Instead, the Section 6.4 design processes packets as fast as it can.
If packets arrive at a higher rate than the software can process them,
the software cannot empty the NIC's receive DMA ring fast enough. So
the DMA ring is often full. In that situation, the NIC discards
incoming packets, and the software never sees them.

The nice thing about the NIC discarding the excess packets is that those
packets don't consume CPU time, and thus don't cause livelock.

Ideally a router would make explicit drop decisions, for example
preferentially dropping the lowest-priority or least-important packets.
This is possible when the bottleneck causing the drops is a slow output
link, and there is enough CPU time to look at all the incoming packets.
But the paper is exploring situations where the load is so high that the
router does not have enough CPU time to look at all the packets, and
thus cannot make clever drop decisions.

Modern NICs are fairly programmable, and support multiple DMA rings, and
can be told to put different kinds of received packets on different DMA
rings. A clever router could use this to prioritize traffic under
overload.