Q: It seems like the majority of papers we have read have been about
improving performance by allowing user programs to have greater access
to the hardware while maintaining security properties. Has there been
any research into what you can make and how it would work if you could
assume only trusted software would run on the machine?

A: Have a look at Unikernels -- http://unikernel.org/


Q: This paper lists six authors, only two of whom are professors. Until now, the
papers we have read typically have only one (or two) student author(s). How does
research with more contributors generally differ from that with fewer? How is
work divided? Is size a preference/style of the particular research groups?

A.1: I'm not sure that the number of contributors correlates very strongly with
anything. You might imagine that having lots of contributors would help you
implement big, complex systems. Or not: some small teams are as productive as
big teams. And anyway "big and complex" is rarely a good thing in a research
project.

A.2: It depends on what the people are good at, and have time for. Here are
some things that some people are particularly good at; few people are
good at all of them:

  * choosing research ideas.
  * writing papers and explaining ideas.
  * proving formal properties.
  * designing systems.
  * implementing systems.
  * analyzing performance.

A.3: Often. Some people like, or are more effective with, big groups, or small
groups. There are also big groups that do projects with only a subset of the
group, and there are big collaborations across groups.


Q: Dune is implemented as a standalone Linux module, with no . Does this mean it
is easier or more likely to be picked up and used in the real world? Has there
been an effort to turn Dune from a research project to a real world tool (such
as fully supporting signals)?

A: Embedding Dune in Linux, and implementing it as a kernel module,
definitely makes it relatively easy for others to use (and for the
authors to build). As opposed to, say, having Dune be a completely new
operating system.

I'm not aware of anyone using Dune.

Q: Do any current production systems implement similar mechanisms to reduce the
overhead of virtualization (i.e. VMWare, cloud VMs, etc.)? If not, what reasons
besides limited compatibility with existing systems (i.e. performance, security
concerns not mentioned in the paper) can explain this?

A: Many virtual machine monitors (VMWare, KVM, Bhyve, &c) use Intel VT-x to
increase performance -- the guest operating system can directly modify its page
table, get interrupts via its IDT, use both CPL=0 and CPL=3, &c. VT-x was
originally intended to increase the performance (and decrease the complexity) of
virtual machine monitors.

The key new idea in the Dune paper is for a kernel to apply VT-x to individual
processes (rather than a VMM applying VT-x to entire guest operating systems),
and to thereby give processes direct access to privileged hardware. I do not
know of anyone having picked up that idea.  However, it's a good idea, and it's
easy for me to imagine it being used in the future, particularly if Intel
continues to increase the efficiency of VT-x.

Q:I don’t really understand what the paper means by providing a ‘process’
abstraction vs. a machine abstraction. I understand the benefits the
applications mentioned get from using Dune, but I’m a little confused on the
above terminology.

A: A process can make system calls to an operating system kernel to read
files, allocate memory, &c.

Code running on a machine abstraction can execute instructions and use
machine registers, but can't make system calls. The JOS and xv6 kernels
expect to run directly on computer hardware, and can also run as a
guests in a virtual machine that provides a machine abstraction.

One of the key new ideas in the Dune paper is to take virtualization
hardware originally intended to provide a machine abstraction to guests,
and use it to instead provide a process abstraction. It's a process
abstraction because guests can make Linux system calls (using the VMCALL
instruction).

Q: The 5th paragraph of section 3.4 about Memory Management notes that
an MMU notifier chain is used to handle various scenarios that may
alter or require page mappings. I tried looking this up but I didn't
find anything that explained it cleanly. What is it and how does it
 this address the issues mentioned in that same paragraph?

A: Notifier chains are an internal mechanism in Linux that allow different
parts of the kernel to ask for notification when various things change.
If you search the web for

  linux notifier chain

you'll find some explanations.

Q: What does it mean by "shadow copy of privileged state" (section 2.1)?

A: The idea is that the guest software (running in "VMX non-root" mode) is able
to read and write privileged registers such as %cr3, but it is not accessing the
real registers. The hardware hides the real register values when the host VMM
switches from root to non-root mode, and restores them when the guest "exits"
back to root mode. Thus there are two sets of privileged registers; the paper
refers to one of them as the "shadow copy" of the registers. I do not know
whether the paper means "shadow" to refer to the registers that the guest sees,
or the registers that the VMM sees.

Q: Both Dune and Exokernel aim to give applications more power in controlling
memory for performance reasons. While Dune adds layers to an existing kernel,
allowing for easier application development, Exokernel seems to strip many
layers away. Does this mean that Dune makes the trade-off of faster application
development for power compared to Exokernel, or are their performances still
comparable?

A: It's probably easier in practice to develop or port applications for
Dune than for Exokernel because Dune lets applications use all the
system calls and kernel services in Linux.

Dune probably delivers better performance than an Exokernel for pagetable
manipulation and delivering page faults to processes, because Dune really does
let the process have direct access to the hardware %cr3/pagetable and IDT. The
Exokernel, in contrast, requires the process to perform system calls into the
kernel to change its page table, and page fault delivery requires
user/kernel/user transitions. Of course when the Exokernel was designed VT-x
didn't exist, so Dune wasn't possible.

Q: This is a more tangentially related question. Would it be possible
to implement an lib OS with Dune in an exokernal type setup?

A: Yes. This could make a lot of sense because a big goal of the Exokernel is to
let ordinary applications use powerful hardware features, and Dune allows
exactly that.

Q: What is meant by the libDune library being "completely untrusted by
the kernel?" Is the implication that libDune does not enable any
behaviors that are already possible with a traditional user process?

A: That's correct; libDune can't do anything that a Dune process can't
already do.

Q: What is the difference between Dune sandboxing and other sandboxing
mechanisms (e.g. seccomp)? Which is better? Which gives better
performance?

A: Dune isn't a full sandboxing system -- it just provides efficient mechanisms
to restrict virtual address mappings, and to intercept system calls. It doesn't
provide the logic for deciding what memory to reveal, or for deciding which
systems calls should be allowed. That logic can be pretty complex. It would make
sense to combine Dune with a larger sandboxing policy system -- which the paper
does by using Wedge.

Q: What are downsides of running all user processes in Dune mode? It doesn't seem
unsafe to do so, and, as I understand, there is no serious performance
disadvantage.

A: There are some applications that run slower in Dune (e.g. mcf and ammp in
Figure 3). Whether that's a serious downside depends on whether you care about
the performance of such applictions. Maybe Intel will improve the performance of
VT-x entry/exit and EPT lookup so that no application are slowed down.

Q: Does running in VT-x slow down ordinary programs?

A: It might -- you can see in Table 2 that some operations are slower in
Dune than in ordinary Linux, due to VM-x costs. I think the main cost is
the time to switch between VM-x root and non-root modes (i.e. between
kernel and Dune process), which affects system calls, interrupts, and
page faults.

The paper argues that, for most programs, the extra per-system-call cost
is not very significant -- for example, many of the entries in Figure 2
show a Dune program executing in a time within a few percent of how long
it takes on standard Linux.

Q: The overhead from Dune seems to come from VMX mode transitions and EPT
translations. How do these overheads compare, and do more traditional systems
have similar overhead?

A: You are right that system call overhead in Dune is much larger than in
Linux; Table 2 suggests a factor of 5x. This might be a serious problem
for a program that spends a lot of its time making simple system calls,
but would not be a problem for most programs (e.g. most of the programs
in Figure 3 run within 5% of the speed of Linux).

Q: Why does the pool of threads improve the performance of process switching?

A: The pool of threads technique (in Section 6.3.2) makes sthread creation
faster -- re-using an existing sthread is faster than creating an sthread from
scratch.

They improve context switch time by using the Intel hardware TLB tagging
feature, the PCID mentioned in Section 2.2. This tags each TLB entry with the
identifier of the thread it belongs to, so that they can switch page tables
without having to flush the TLB. The TLB uses only entries that are tagged with
the current thread's identifier.

Q:  If I'm not mistaken, the novelty of the speed increases from not
needing to create sthreads are from recycling due to not needing TLB
flushes and context switches-- existing threads are recycled for newer
usages. This seems like a very strong idea in minimizing TLB flushes
and creation processes, and reducing processing time and memory as a
result, so are there any examples of prominent features encoded in
Linux that also use this idea already somehow?

A: It's pretty common to see systems that reduce thread-creation cost by
initially creating a set of "worker" threads, and then re-using them for
each request. For example, the Apache web server uses worker threads and
processes.

Q: The paper compares the example of running a garbage collector in a VM vs with
a Dune module. How would even run a garbage collector in a VM? The memory it'll
be cleaning up is the virtualized memory?

A: In ordinary situations -- for example if you run Java on Linux and Java's
garbage collector runs -- the garbage collector uses virtual addresses. The
MMU's page-table hardware translates these to physical addresses which refer to
real storage in RAM, so ultimately the garbage collector is finding real free
space in real RAM.

The situation with running a garbage collector in a virtual machine guest (or in
a Dune process) is similar. Java and the garbage collector use virtual
addresses, which the guest page table and the VT-x EPT translate to physical
addresses that refer to real RAM. So again the garbage collector is ultimately
referring to real memory.

Q: How do fast faults and memory protections relate to improving
performance of garbage collection? It doesn't seem to be something
that you can optimize by skipping copying some data like you can by
knowing dirty bits or controlling page tables.

A: Virtual memory tricks turn out to be useful in garbage collectors that
operate concurrently with the program. It's often the case that the
garbage collection algorithm needs to be aware of the program's reads
and writes in order to preserve correctness despite concurrency, and
page protection hardware can be an efficient way to do this.

In the Boehm collector that the paper benchmarks, the collector traces
pointers (to find all live objects) in parallel with the program's
execution. If the program changes some pointers in objects after the
collector has traced those objects, the collector must re-trace them
later. The collector uses the dirty bits maintained by the MMU (in each
PTE) in order to detect which pages contain objects that the program
modified after they were initially traced.

People have thought of many ways to use virtual memory hardware for
garbage collection. This paper by Appel and Li mentions a few:

  http://www.cs.cornell.edu/courses/cs614/2003sp/papers/AL91.pdf

Q: Is exposing the EPT safe because VT-x only modifies page table as part of a
virtual machine (so you're not actually managing data apart from what you're
given as part of your VM)?

A: Only Dune (in the kernel) can read or write the EPT. A Dune process can
modify its own page table, but it can't get at the EPT.

Allowing the Dune process to modify its own page table is safe because Dune sets
up the EPT so that it only provides mappings to physical memory that Dune has
allocated to the process.

Q: I would be happy for some additional information regarding Intel�s
process-context identifier (PCID) feature.

A: You can read about PCID in Section 4.10 of the "Intel 64 and IA-32
Architectures Software Developer’s Manual Volume 3: System Programming Guide":

http://www.intel.com/content/www/us/en/architecture-and-technology/64-ia-32-architectures-software-developer-system-programming-manual-325384.html

Q: The paper references multiple different "ring"s what are these specifically?

A: When the paper says "ring 0", it means "executing with CPL=0", that is,
executing with hardware privileges. And the paper's "ring 3" means executing
with CPL=3, i.e. in user mode without hardware privileges.

Q: How does Dune maintain a consistent TLB? By exposing the tagged PCID feature
to dune processes, would it be possible to collide with an ID used by the
host-kernel for another host-process? Similarly, how are VPIDs related?

A: The Intel VT-x mechanism virtualizes TLB tags, so each Dune process has an
independent set of tags. Each process can assign TLB tags however it likes,
without interfering with other processes.

In the language of VT-x, each process has a separate VPID assigned by
Dune; Dune tells the VT-x hardware which VPID to use with each process.
Each VPID has an independent set of PCIDs (which the paper calls TLB
tags); the process tells the processor the current PCID (using the low
12 bits of %cr3).

Each TLB entry is tagged with both a VPID and a PCID, and the MMU only
uses entries that have the currently-set values.

Q: I was most confused about exactly how Dune can provide all of its features completely
safely without having to use a fully fledged virtual machine. Are managed system calls
and special state really all that is needed for these privleged CPU features to be
exposed to something running in user space?

A: I think one surprise about virtual machines is how little (at a conceptual
level) needs to be protected -- mostly the address space (page tables).

On the other hand, Dune uses VT-x, and VT-x is basically a virtual machine
mechanism. So while Dune doesn't provide a fully fledged virtual machine, it
uses hardware that is powerful enough to do so.

Q: Much of the implementation of Dune is highly architecture dependent (hardware
architecture). Would the overall implementation (the interface to the user
applications) be independent of architecture, or would this need to be adjusted
on a per architecture level.

A: Dune requires the hardware to provide VT-x or something similar.

Dune could be made architecture-independent by telling how to detect what
processor type it is running on, and using the instructions &c for the virtual
machine support in the kind of processor. Of course the processor would have to
provide something with capabilities similar enough to VT-x that Dune would make
sense.

I do not know how much of the Dune code could be made independent of the machine
architecture.

Q: Although they take steps to prevent Dune from allowing processes to
monopolize a cpu, how much inefficiency could a malicious program introduce with
the additional privileges granted by Dune?

A: I'm not aware of any damage a Dune process could do that can't be done
equally well with a traditional Linux process. A place to look might be the TLB
-- perhaps there's a way to reduce TLB caching effectiveness by manipulating the
PCID.

Q: What would happen if a dune program used the int instruction? Would we
end up in the kernel, or in the Dune module? I don't quite understand
why VM exits are the only way we can access the kernel from a Dune
program.

A: If a Dune process executed INT, the hardware would try to deliver the
interrupt to the process itself through the process's IDT. If the IDT entry
isn't valid, that's a double-fault, so the processor would try to deliver the
double fault to the Dune process through the process's IDT.  If that IDT entry
isn't valid, then the processor will exit VT-x non-root mode and give control
back to the kernel. It's cheaper for the process to just do the VM exit
directly.

The Linux kernel (and the Dune extension) need to run in VT-x root mode so that
they can control VT-x (e.g. set up the EPT and the VMCS). So a system call from
a Dune process into the kernel needs to switch from VT-x non-root mode to VT-x
root mode -- i.e. do a VM exit.

Q. Section 4.2, one of the limitations listed is: "we have not fully integrated
support for signals despite the fact that they are reported by the Dune
module. Applications are required to use dune signal whereas a more compatible
solution would override several libc symbols like signal and sigaction."  What
is the difference between using dune_signal and the libc symbols? I'm assuming
it's more than just the naming to override the symbols. Also what are the
signals generally used for?

A: I suspect that dune_signal() takes different arguments and has different
behavior than the standard Linux signal call. But I don't know what the
differences are.

Here are some ways that Linux uses signals:

  * A process can ask Linux to notify it when it suffers a page fault;
    Linux delivers the notification via a signal.
  * When a process does something that the hardware views as illegal
    (e.g., divide by zero), Linux can notify the process using a signal.
  * When you type control-C, Linux delivers a signal to the process
    you're running.
  * A process can ask Linux to notify it when input arrives on a pipe
    or socket; Linux delivers the notification with a signal.
  * I'm sure I'm forgetting other uses.

Q: Why is it not possible to leave the Dune mode?

A: Intel VT-x doesn't support this. VT-x is intended for for virtual machines,
and it wouldn't make sense for a virtual machine guest to switch to executing as
the host. Perhaps Dune could implement exiting Dune mode by creating a new
non-Dune process and giving the Dune process's memory to the new process, but I
suspect they never needed this feature.

Q: Is there any way to have a program use Dune with restrictions on its access
to privileged instructions? Or is it all or nothing?  What's the extent of the
process isolation that is preserved in Dune?

A: VT-x can be configured to restrict privileges, but I don't think Dune
makes much use of that. Dune tries to let processes do as much as
possible, consistent with isolation.

Dune processes are as isolated as ordinary Linux processes. It may seem
that the ability of a Dune process to put anything it likes into its
page table might break isolation. However, Intel VT-x hardware maps the
"physical addresses" in the process's page table a second time, using
the EPT (extended page table). So there are three kinds of address and
two mappings:

  ProcessVirtual --pagetable--> ProcessPhysical --EPT--> RealPhysical

The process can control how virtual addresses map to ProcessPhysical
addresses, but the EPT controls how (and whether) ProcessPhysical
addresses map to RealPhysical addresses (which refer to RAM). Dune
controls the EPT; the process cannot see or modify the EPT. Dune sets up
the EPT so that contains mappings only to RealPhysical addresses that
Dune allocates to the process. So a Dune process is isolated so that it
can only use its own memory, regardless of what it puts into its page
table.

Q: What are the extra complications of nested VT-x that makes it not commonly
supported?

A: The hardware doesn't directly support nested VT-x -- if code executing
in non-root mode executes the VMLAUNCH instruction, it's an error. For
the instruction to work correctly, the hardware would have to save and
restore a stack of virtual machine state records, but it doesn't do
that.

Software can provide nested VT-x -- a nested VMLAUNCH will cause a VM
exit to the surrounding root-mode software, which can create a VMCS and
EPT mimicing the nested VMCS created in the non-root-mode code.

Q: According to the abstract, Dune uses the virtualization _hardware_ in modern
processors to provide process abstraction. Doesn't this imply some sort of
limit on the number of concurrent processes that can be running? What
implications does that have on the overall useability of Dune?

A: I don't think the VT-x hardware imposes a limit on the number of Dune
processes that can exist. Dune can create lots of VMCS structures, one
for each process. When Dune context-switches to a particular Dune
process, it tells the processor hardware which VMCS it is switching to.

Of course, only as many processes can actually execute at a given time
as there are cores (just as on non-Dune Linux). Each core has its own
VT-x machinery, so each core can execute a different Dune process.


Q: Has the attempt to build Dune on top of VT-x revealed any limitations to the
VT-x extension itself?

A; The paper's Section 7 mentions two: EPT performance could be improved,
and the EPT guest-physical address space size should be increased.

Q: It seems like Dune relies extensively on many of the features provided by
Intel/AMD (e.g VMX , EPT). Does that imply that Dune is incompatible with non
Intel/AMD hardware?

A: The current Dune implementation probably only runs on modern Intel processors.

If some non-Intel processor supported something similar to VT-x, Dune could
probably be modified to run on it.

Q: I've seen and used Intel VT-d to give VMs access to physical devices (like a
GPU). Could Dune similarly expand to take advantage of that > tech to give
processes direct access to pcie cards and similar devices?

A:  Yes, I'm sure it could. I know the authors of Dune were at one point
thinking of adding VT-d support, though I don't know if they have.