Q: I am confused by exactly what the shim layer does and how it is
implemented - it supports the Go runtime, so cannot be written in Go,
but the paper also mentions no use of C. I would guess that it's
written in assembly, but I'm still confused on how it is used after
the system is booted, and the interface between the shim and the
runtime.

A: The Go runtime assumes there is an OS underlying it and it makes
system calls to it to get going.  Biscuit runs on the bare hardware
and so Biscuit must emulate those system calls to be able to use the
Go runtime unmodified.  This is what the shim layer in Biscuit does;
for example, it allocates memory for the Go heap so that the runtime
can get going.  It mostly runs at boot time and is indeed written in
assembly.  ￼

Q: It is mentioned that there are non-heap allocations in Biscuit,
which is out of the scope of the heap exhaustion prevention mechanism
presented. Since there is no such distinction in xv6, what exactly
gets allocated in the Go heap and what doesn't?

A: Everything the kernel allocates (file descriptors, tcp state, etc.)
lives in the kernel heap, except for physical memory pages.
User-level programs implement typically their memory allocation (e.g.,
malloc in xv6), asking the kernel for extra memory pages if they need
more memory.  The xv6 kernel doesn't do much dynamic memory
allocation, because it doesn't have a memory allocator (although you
implemented one in the buddy lab); instead, it has a fixed number of
file descriptors, procs structures, etc.

Q: What is a futex?

A: A way of implementing user-level locks without having to make a
system call on each lock acquisition/release.  Run "man 2 futex" on a
Linux machine or Google it.

Q: The biscuit paper talks about "garbage collection", but doesn't
really distinguish between tracing garbage collection and memory
management based solely on reference counting. It seems like a lot of
the issues associated with tracing collection (long stops to sweep
memory, unpredictable timing) are less of an issue in reference
counting schemes. Could a kernel be written in a language that only
used reference counting, assuming that cyclic data structures could be
avoided?

A: Biscuit has one or two cycles in its data structures.  Reference
counting would be an option, but if reference counts are contended
between several cores, then they can tank performance.  See the
upcoming non-scalable locks paper. Also, see the RCU paper, which we
will cover in future lecture, for the problems with contended cache
lines.

Q: How does go handle interrupts without disabling further interrupts?
How does it guarantee that it wont get interrupted or how does it
handle it if it does.

A: Since Go typically runs on top of an OS it doesn't have to worry
about interrupts, the OS deals with them.  In case of Biscuit, that
assumption isn't longer true so Biscuit has to do something to make it
safe to accept interrupts.  The way biscuit does it is to do basically
nothing in the interrupt handler, other than setting a play to
indicate an interrupt happened.  Then, later biscuit/go will schedule
a go thread to process the interrupt; that go routine can acquire
locks etc., allocate memory, etc, like any normal go thread.

Q: Also how high level is go exactly? Is it compiled or interpreted?

A: It is compiled

Q: While I see the benefits of not having to handle errors from not
having enough heap space, couldn't the blocking behavior of system
calls still substantially slow down programs?

A: Yes, if there is no memory, those system calls will make no
progress until enough memory is available (e.g., after the kernel has
killed a bad-citizen process).  The POSIX interface has no good way of
communicating to user-level processes that the kernel is low on
memory.  If the system hasn't enough memory to support the currently
running applications, things will be slow.

Q: I don't quite understand how static analysis works on lots of
programs, e.g. programs whose memory usage could be dependent on the
algorithm / input (e.g. a program which searches over O(n^2) possible
line intersections, but by some mathematical reasoning, we know that
there are at most O(n) possible candidates).

A: In biscuit, the kernel programmer must annotate the Go code for
loops where the analyzer cannot determine what the upperbound of the
loops is (e.g., when the loop is dependent on an argument of a system
call).  Biscuit also avoid recursive functions for the same reason.
With relatively few annotations, the tool was able to statically
determine what an upperbound on the total amount of memory that each
system needs to run.  Building the tool was a bit of work, but it
benefited from Go's packages for analyzing Go code, including a
package that determines when variables go out of scope ("escape
analysis").
 
Q: What I found most confusing about the dependency hirarchy. For
example, how does Go have builtin threads at the operating system
level? I though that the whole concept of threads relied on the
operating system to implement.

A: Without Biscuit, the Go runtime implements multiple threads on top
of OS-provided threads.  The runtime has its own scheduler and context
switching code.  It multiplexes many go routines over a smaller number
of kernel threads.

Q: The paper says that Biscuit is not optimized for NUMA. What is NUMA
and why is it important?

A: Computers with NUMA are quite common.  For example, if you have a
server with several sockets (each socket holding a multicore
processor) and with several memory banks.  Often one bank is closer to
one socket than another and so if the OS allocates memory in banks
close to the socket that is using that memory, it is good for
performance.  Biscuit doesn't do this, but real operating systems do.

Q: I don't quite understand what the benefits are of using Go over
other more common HLL's like Java. It seems like more popular
nlanguages have features like garbage collection and type safety (and
other abstractions like threads).  Wouldn't building the language in a
more common language potentially make developing and growing the OS
easier?

A: One virtue that Go has going for it is that it designed for systems
programming, but researchers in the past have implemented kernels in
Java.  It is possible that the results of the paper carry equally well
over to Java; at a high-level they are both HLL with a good garbage
collector.  On the other hand, Go allocates more variables than Java
on the stack instead of heap, which may change the results.

Q: Didn't we say that we need to disable interrupts during critical
periods as otherwise we could deadlock due to interrupt code trying to
grab locks that we had already acquired. How does Biscuit avoid such
issues? Do they use locks that you can reacquire if the interrupted
thread already owned it?

A: Biscuit avoids this deadlock problem by not taking locks in an
interrupt handler.  It basically doesn't do anything in the interrupt
handler, other than setting a flag that an interrupt happened.  Then,
later the kernel schedules a kernel thread to process the interrupt;
that is a normal thread so it can take acquire/release locks.

Q: It seems like many of the commonly used operating systems would
require too much work to be converted to a HHL but I'm sure that there
are a lot of methods they use to avoid the pitfalls and bugs
mentioned. What are some of the techniques that we haven't taken
advantage of when coding our labs / when xv6 was implemented.

A: Yeap, it would be an incredible amount of work to convert Linux to
a HLL, and unlikely to make sense, since the Linux programmers are
happy with C.  There are indeed kernel C annotations and sanitizers to
avoid common C bugs, but nevertheless they happen (see figure 6 in the
paper).

Q: What are the difficulties or considerations to take when designing
a garbage collection system?

A: A great question, but a very broad one and it is hard to summarize
decades of research on garbage collection in a short response.  This
wiki page gives a high-level introduction into some of the issues:
https://en.wikipedia.org/wiki/Garbage_collection_(computer_science).

In the context of Biscuit, issues we were worried about is the number
cycles that the garbage collector is using while running the kernel
(i.e., slowing down kernel operations) and delays the garbage
collector might introduce because of running a bit of collection while
performing latency-sensitive operations (receiving and responding to a
network package).

Q: What exactly is NGINX?

A: It is a widely-used high-performance web
server. https://nginx.org/en/

Q: What is the point of comparing Biscuit against Linux on user
applications if Linux contains many more features that Biscuit doesn't
have? The paper acknowledges this, but goes ahead to show that Linux
is 10% faster than Biscuit. How should this be interpreted?

A: With a large grain of salt.  We did our best to do a fair
comparison, but it is inherently imprecise.  We tried to check that
the comparison is fair (for example, we looked at execution profiles
to see if they lined up), but it is easy to imagine that some missing
feature could change the results.  The conclusion to draw is that it
is likely that Go isn't in the way of getting decent performance and
that the difference in performance isn't going to be several factors,
but instead some small fraction.  We were worried going into this work
that Biscuit would be much slower.

Q: In chapter 7 the paper says "Goroutine scheduling decisions and the
context switch implementation live in the runtime, not in Biscuit. One
consequence is that Biscuit does not control scheduling policy; it
interits the runtime's policy".  What exactly does that mean? Does
Biscuit not have priorities for processes? What are runtime's
policies? Are they usefull for general use? Can Biscuit be vulnerable
to receive livelock?

A: Since Biscuit uses Go's runtime scheduler to schedule go routines,
it inherits the policy that Go uses; since the Go runtime was designed
to run a single multi-threaded application, and wasn't designed to run
a kernel, its policy is quite simple.  We didn't want to modify the Go
runtime so Biscuit has a simple scheduler too.

Biscuit can dynamically switch between interrupts and polling of the
network, to avoid receive livelock.

Q: The most confusing part about the paper was understanding
reservations - specifically the last paragraph in section 6.3.2 about
fork, exit, exec and close. I don't reallu understand why we only need
enough heap space for one call to close.

A: Fork/exit/exec may call close many times (once for each open file
descriptor).  If each call to close() results in a bit more memory
being alive than the upperbound grows with the number of calls to
close.  What we did is make sure that when close() returns, there is
no extra memory alive.  As a result, it is safe to call close many times in
fork/exit/exec.

Q: How did Biscuit overcome the lack of memory fence functionality in
Go?

A: It used primitives from the atomic package, which issue memory
fences implicitly.

Q: What is the point of using a garbage collector when malloc provides
better control of performance.

A: The programmer doesn't have to free memory explicitly.  For
example, when the kernel cleans up an process that exited, the kernel
programmer doesn't have to explicitly free every piece of memory that
the process is using.  The collector will free all memory that isn't
reachable anymore automatically.  This avoids classic C bugs such as
double free, use after free, or forgetting to call free.

Q: How does the killer thread discern between good and bad citizens by
assuming that the process that has the most mapped memory regions,
file descriptors, and threads after garbage collection is a genuine
bad citizen?

A: The killer thread has no reliable mechanism to tell good from bad
citizens; it uses a heuristic: count the amount of kernel heap that a
process is using.  The process with most kernel objects is considered
bad.

Q: Is it possible to reduce the performance reduction of garbage
collection by simply supplying the machine with a lot more ram to
affect how often garbage collection is done?

A: Indeed, Figure 8 quantifies how much extra memory you need to keep
the GC costs under control.

Q: Was the type conversion a source of inefficiency due to the Go
being strongly typed or was it not a problem?

A: Biscuit uses few Go interfaces, which have a cost, but type
conversion isn't the major HLL tax.  Prologue cycles for checking
whether the stack must be expanded and whether the GC needs a stop of
the world pause is the major one.

Q: Can you elaborate on common strategies for dealing with kernel heap
exhaustion? The paper mentions that the linux kernel unwinds many
function calls to hopefully get back to a state where the heap is not
exhausted. How is this implemented? What does the kernel do if the
unwind fails? Biscuit obviously has a different design -- was this
design chosen because of limitations encountered by using the HLL, or
was it chosen for other reasons?

A: Linux fails allocations and the caller must deal with it (check it
and unwind further).  This works, but makes the life of kernel
programmers harder. One reason Biscuit doesn't follow this strategy is
because Go doesn't allow allocations to fail.

Q: For how long and how many people did it take to implement Biscuit?

A: We don't have a precise number, because we tried different plans
for writing Biscuit and we didn't keep track of it. Cody, as a PhD
student, spent a few years doing most of the work.

Q: Out of curiosity are there any labs that we've completed that would
have been impossible with Biscuit? I get the sense that Biscuit may
use more memory and have performance delays but as far as correct
implementation, Biscuit seems to have the necessary features of an OS
kernel.

A: Nope.  The features you implemented in the labs Biscuit has more
advanced versions of them.  In fact, Biscuit was derived from xv6 but
then substantially extended to achieve high performance and to be able
to run a few Linux applications unmodified.