"The Interaction of Architecture and Operating System Design"
Anderson, Levy, Bershad, Lazowska
ASPLOS 1991

Discussion structure
  paper claims, problem statements
  Tables 1 and 2, since they are the motivation
  assumptions (u-kernel, risc)
  choose just one subsystem: rpc

What are we doing here?
  case studies
  examples of research papers, learn how to write them
  understand content
  dig deeper into details than any one of us could alone
  put work into larger context - does it matter? is it useful?
    is it the right way to go?

Discussion format
  I'll start out by asking questions
  Feel free to interrupt me or ask your own questions
    Or to answer each others questions, or discuss each others points
  If you don't understand, ask!
  Feel free to be critical (of me or paper), or to defend

Why we're reading this paper:
  Digs into details of most important kernel abstractions.
  Talks about how choice/design/impl of abstraction affects performance.

When was the paper written? 1 year ago? 10? 20?
  1991 ASPLOS

Why does it matter when it was written?

What kind of people wrote it -- CPU or O/S designers?
  O/S.
  They have much more to say about what CPU screws up than O/S.

What's the main *point* of this paper? What are the authors trying to
convince us of?
  O/S performance has improved less than app perf w/ faster CPUs
  They explain why
  They suggest what to do about it

What are the main reasons they claim?
  New RISC machines don't support [new] microkernels well.
  CPU designs that improve apps irrelevant to kernel speed.
    Because kernels are different...
    We're going to want to know how.

So what should we be looking for as we read the paper?
  1. evidence that newer CPUs are "slower" on OS than on apps
  2. reasons for this trend
  3. evidence that newer OSes are slower than older ones
  4. reasons for this trend
  5. evaluation of larger significance, if any
  6. solutions

You should figure out a list like this in your head as you read any
abstract/intro to help guide your attention in the paper. Papers are
long and hard to read, it pays to have a plan for what to skip and
what to read carefully. And you don't want to let the abstract
lull you into thinking that a paper proves a point that it doesn't prove.

How are we going to decide if author's trends are important?
  Assuming that they prove all their factual points about trends up to 1991.
  First, need to believe that big fraction of time spend in O/S, and will continue to.
  Second, need to believe that trends will continue into the future (well, post-1991).

Quick RISC overview
  simpler instructions
    but thus more of them; why isn't that usually a problem?
  what's the point of simpler instructions?
    much easier to pipeline, huge improvement w/ same mHz, die size, &c
  don't implement in hardware if could be done in s/w
    particularly uncommon operations
    e.g. system calls
    use transistors to make ADD fast instead
  how do RISC cpu designers decide what's worthwhile to put in h/w?
    industry standard benchmarks
    usually *not* o/s-intensive...

Quick microkernel overview
  tiny kernel, most stuff in servers
  many more system calls
  some tricks harder to play, since most o/s code isn't privileged
    e.g. shared address space

Let's make two tables:
  Problems with CPUs.
    large register sets (sparc windows)
    deep pipelines (88000 o/s must save 30 regs of pipeline state)
    no h/w vectoring (in MIPS)
    limited write buffers (R2000, fixed in R3000)
    cpu speed vs memory speed
    i860 page fault handler must interpret to find faulting addr
    caches that have to be flushed during address space switches
  Problems with O/S.
    none mentioned?

Now let's look at the evidence they present.

What does Table 1 tell us?
  Is the point trends or problems?
  w/ CPUs or OSes?

Where do the the "Time" numbers come from?

Where do the "Relative Speed" numbers come from?

Why do these numbers look this way?
  Why do RISC machines beat CVAX for apps, at same-ish mHz?
    Why around 5x, not 2x or 20x?
  Are RISC machines literally slower than CVAX for O/S primitives?
    So what's the problem?
  High-level reason why RISC O/S performance not keeping up?
    they gain app performance w/ more state and parallelism: registers, cache, pipeline
    but that state must be shuffled during syscall

In an ideal world, what would Table 1 look like?

How we can assign blame based on Table 1?
  Are numbers fundamental to h/w?
  Or is the point that we could optimize s/w?
  (remember, they tuned the s/w, they think it's the best possible)

Let's focus on
  2.2: Local communication
  2.3 and 2.4: System calls (this is where the real meat is)

What are the steps required to send msg from P1 to P2?
  (Table 4...)
  P1 makes system call
  kernel copies data from P1?
  P1 sleeps in kernel
  kernel switches to (waiting?) P2 kernel half
  kernel copies data to P2?
  return from P2 system call into P2

We're looking for
  Ways in which RISC supports this less well than CISC
  Ways in which microkernel implements this less well than monolithic

What are the problems they mention?
  (2.3, table 5, 2.4)
  large register sets (sparc windows)
  deep pipelines (88000 o/s must save 30 regs of pipeline state)
  no h/w vectoring (in MIPS)
  limited write buffers (R2000, fixed in R3000)
  cpu speed vs memory speed
  caches that have to be flushed during address space switches

How do they establish that the mentioned problems are actually responsible?
  For the most part they do not!
  Would this have been straightforward?

What are they trying to show in Section 5?
  1. That O/S primitives matter: big fraction of total time.
  2. That O/S trends are making things worse.

Do they show that it all matters?
  Doesn't matter if traps are slow if they are rare.
  Section 5 a little weak. Counts how frequent traps &c are.
    Extrapolates w/ Mach 2.5 -> Mach 3.0. Lame.
  Table 7 shows that maybe 20% of total CPU time in o/s primitives.
    Is this a lot? Maybe not.

Do they show that O/S trends are reducing performance?
  O/S using traps &c more (microkernels).
  CPUs making traps &c relatively more expensive.
  Can't continue both trends indefinitely.
  That was 1991. Which trend won?

What did we learn from this paper?
  Lots of performance details.
  Choice of O/S and CPU abstractions matters.
  System-level view: combined CPU-O/S-application behavior

Was it a good paper?
  Clearly written?
  Clear statement of goals/problem/method/ideas?

Does performance matter?

When does performance matter?
  Google runs service on 10,000 PCs...