Q: Has RadixVM been adopted by other operating systems?

A: Not that I know of. Replacing the complete VM system, which is what
the paper proposes, is a gigantic change. Lots of Linux subsystems
have their fingers in the VM code. It is not just replacing one
implementation of an interface with another implementation. This kind
of change would involve quite a bit of discussion among the
maintainers of the VM system etc, and some exploration to see if there
are any serious down sides. These kinds of changes do happen but more
slowly, and mostly when there is a burning problem. Although the paper
addresses a real problem in Linux, it is unclear if it is a burning
problem that needs to be addressed right now.

Q: How do researchers decide whether to base their implementations
on Linux versus a research kernel such as xv6?

A: In OS conferences you see papers that modify existing OSes and
papers that prototype ideas in simpler kernels. It depends on the
research question being investigated. If the question is relevant to
existing OSes, then it is more convincing to modify an existing OS to
demonstrate the solution to the question. However, in some cases it is
difficult for a small team to modify an existing OS to implement their
solutions (like in this paper) and researchers resort to prototyping
solutions in simpler kernels. In this case, the hope is that the
prototype might convince developers of an existing OS to implement the
idea in their system, either to evaluate it further or to adopt it.

If the question is less relevant to existing OSes (e.g., researchers are
exploring a new radical overall design for an OS), then there is little
choice other than to prototype the design with a simple kernel. Of
course, it would be ideal to build a fully-functional OS in the new design
style, but that is typically impossible with a small team of researchers.

Q: Why does Metis scale near-linearly on RadixVM but very poorly on
Linux?

A: Because in Linux the data structure for an address space of a
process is protected by a single lock, which becomes a bottleneck if
many threads of the same process want to modify the process's address
space.

Q: What is a good length for an epoch? A longer length would avoid
flushing to the global reference count, but it seems like it would
cause memory to be freed up much more slowly, which might be a
problem.

A: Indeed, that is the key trade-off. Linux also uses epochs for
RCU-based concurrent data structures, and has a similar trade-off.
10msec in practice seems to be a reasonable number to use (not too
fast to avoid the overhead of the epoch scheme, not too slow so that
memory is reclaimed reasonably quickly). I don't know if anyone has
studied this carefully with realistic application workloads.

Q: What kinds of applications are likely to benefit from RadixVM?

A: The observation in the RadixVM paper is that any multithreaded
application that calls mmap/munmap a lot will run the risk of not
scaling, because of the lock that protects the VM datastructures for a
process. All the threads in a process share the same VM data
structures. In practice, developers have worked around this bottleneck
by modifying their applications to allocate much memory at once and
holding on to it, so that there are few mmap/munmap calls. This is
also the case for the app where Bonsai does well: it allocates memory
in large chunks (8 MB). If it allocates in more standard sizes (64
KB), then the application suffers with Bonsai. RadixVM avoids the work
that developers have to go through to modify their applications.
Whether the pain of modifying applications justifies a new VM design
is unclear at this point.


Q: Is RadixVM's worse sequential performance compared to Linux due
solely to its high memory overhead?

A: No. The Linux VM system is also spends less time executing
instructions. If I remember correctly, the Linux page-fault handler is
much more streamlined than RadixVM's page-fault handler. In general,
the Linux developers have spent significant amount of energy making
the VM system execute fast on a single core, which the RadixVM
developers didn't do.

Q: In real life applications, would memory consumption be a problem
for RadixVM?

A: I don't know. The paper's design is careful about memory
consumption and the experimental results bear that out. However, the
VM system hasn't been used in the real world; there might be hidden
problems (incl. memory memory use) that show up with some real life
applications. Engineers who want to adopt RadixVM in their OSes would
first do some more exploration with real life applications before
committing to the paper's design.

Q: What is reference delta caching?

A: This refers to the fact that each core doesn't update the shared
ref count directly. Instead, each core maintains a delta that reflects
its changes to the shared ref count (e.g., if the core increments 3
times, and decrements once, then the delta is +2). The caching part
refers to the idea that RadixVM does this only for frequently-accessed
ref counts, since for those scalability matters, but not for all ref
counts (which keeps the measure pressure ref counts under control).

Q: What is the reason for having weak reference counts?

A: Here's an example. Consider the buffer cache. Each entry in the
buffer cache has a pointer to an in-memory page that contains a disk
block. When no kernel thread is using that disk block, the ref count
in the entry is zero. The buffer cache may no want to delete the page,
however, in case a thread comes along and wants to read the disk
block.

Weak references make it possible to support this scenario. The entry
keeps a weak reference to the page along with the ref count. If a thread
wants to read the disk block, then the buffer cache can try to revive the
weak reference by calling tryget. If successful, tryget returns the
reference to the page to the calling thread (after clearing the dying bit)
and after incrementing the ref count (since now one thread has a
reference).

Q: How does one judge the complexity of an implementation? RadixVM was
about 4000 lines of code, and might be complicated to implement
correctly. But the paper seems to downplay this complexity.

A: The paper makes an implicit comparison with an alternative design
that isn't documented but only hinted at in the paper. In this
alternative design, the VM system makes use of lock-less concurrent
data structures (such as concurrent skip lists). With those data
structures it is more difficult to implement mmap, munmap, etc.
correctly because those designs try to avoid locking mappings for
pages. RadixVM's implementation of mmap and munmap is easier compared
to this alternative design because it does use locks for mappings.
That doesn't mean that RadixVM is a simple design, it is just simpler
than an even more complicated design.

Q: How does RadixVM keep track of which cores have a particular VA in
their TLB?

A: It doesn't keep track directly, but approximates conservatively by
allocating per-core page tables. If a core has mapped a page, it is
possible that the page is in its local TLB. If a core hasn't mapped a
page, the core knows for sure that it is not in the local TLB. When
performing TLB shootdown, RadixVM sends shootdown messages the
cores that have a page mapped, since they could have it in their TLBs.