"The Interaction of Architecture and Operating System Design" Anderson, Levy, Bershad, Lazowska ASPLOS 1991 Discussion structure paper claims, problem statements Tables 1 and 2, since they are the motivation assumptions (u-kernel, risc) choose just one subsystem: rpc What are we doing here? case studies examples of research papers, learn how to write them understand content dig deeper into details than any one of us could alone put work into larger context - does it matter? is it useful? is it the right way to go? Discussion format I'll start out by asking questions Feel free to interrupt me or ask your own questions Or to answer each others questions, or discuss each others points If you don't understand, ask! Feel free to be critical (of me or paper), or to defend Why we're reading this paper: Digs into details of most important kernel abstractions. Talks about how choice/design/impl of abstraction affects performance. When was the paper written? 1 year ago? 10? 20? 1991 ASPLOS Why does it matter when it was written? What kind of people wrote it -- CPU or O/S designers? O/S. They have much more to say about what CPU screws up than O/S. What's the main *point* of this paper? What are the authors trying to convince us of? O/S performance has improved less than app perf w/ faster CPUs They explain why They suggest what to do about it What are the main reasons they claim? New RISC machines don't support [new] microkernels well. CPU designs that improve apps irrelevant to kernel speed. Because kernels are different... We're going to want to know how. So what should we be looking for as we read the paper? 1. evidence that newer CPUs are "slower" on OS than on apps 2. reasons for this trend 3. evidence that newer OSes are slower than older ones 4. reasons for this trend 5. evaluation of larger significance, if any 6. solutions You should figure out a list like this in your head as you read any abstract/intro to help guide your attention in the paper. Papers are long and hard to read, it pays to have a plan for what to skip and what to read carefully. And you don't want to let the abstract lull you into thinking that a paper proves a point that it doesn't prove. How are we going to decide if author's trends are important? Assuming that they prove all their factual points about trends up to 1991. First, need to believe that big fraction of time spend in O/S, and will continue to. Second, need to believe that trends will continue into the future (well, post-1991). Quick RISC overview simpler instructions but thus more of them; why isn't that usually a problem? what's the point of simpler instructions? much easier to pipeline, huge improvement w/ same mHz, die size, &c don't implement in hardware if could be done in s/w particularly uncommon operations e.g. system calls use transistors to make ADD fast instead how do RISC cpu designers decide what's worthwhile to put in h/w? industry standard benchmarks usually *not* o/s-intensive... Quick microkernel overview tiny kernel, most stuff in servers many more system calls some tricks harder to play, since most o/s code isn't privileged e.g. shared address space Let's make two tables: Problems with CPUs. large register sets (sparc windows) deep pipelines (88000 o/s must save 30 regs of pipeline state) no h/w vectoring (in MIPS) limited write buffers (R2000, fixed in R3000) cpu speed vs memory speed i860 page fault handler must interpret to find faulting addr caches that have to be flushed during address space switches Problems with O/S. none mentioned? Now let's look at the evidence they present. What does Table 1 tell us? Is the point trends or problems? w/ CPUs or OSes? Where do the the "Time" numbers come from? Where do the "Relative Speed" numbers come from? Why do these numbers look this way? Why do RISC machines beat CVAX for apps, at same-ish mHz? Why around 5x, not 2x or 20x? Are RISC machines literally slower than CVAX for O/S primitives? So what's the problem? High-level reason why RISC O/S performance not keeping up? they gain app performance w/ more state and parallelism: registers, cache, pipeline but that state must be shuffled during syscall In an ideal world, what would Table 1 look like? How we can assign blame based on Table 1? Are numbers fundamental to h/w? Or is the point that we could optimize s/w? (remember, they tuned the s/w, they think it's the best possible) Let's focus on 2.2: Local communication 2.3 and 2.4: System calls (this is where the real meat is) What are the steps required to send msg from P1 to P2? (Table 4...) P1 makes system call kernel copies data from P1? P1 sleeps in kernel kernel switches to (waiting?) P2 kernel half kernel copies data to P2? return from P2 system call into P2 We're looking for Ways in which RISC supports this less well than CISC Ways in which microkernel implements this less well than monolithic What are the problems they mention? (2.3, table 5, 2.4) large register sets (sparc windows) deep pipelines (88000 o/s must save 30 regs of pipeline state) no h/w vectoring (in MIPS) limited write buffers (R2000, fixed in R3000) cpu speed vs memory speed caches that have to be flushed during address space switches How do they establish that the mentioned problems are actually responsible? For the most part they do not! Would this have been straightforward? What are they trying to show in Section 5? 1. That O/S primitives matter: big fraction of total time. 2. That O/S trends are making things worse. Do they show that it all matters? Doesn't matter if traps are slow if they are rare. Section 5 a little weak. Counts how frequent traps &c are. Extrapolates w/ Mach 2.5 -> Mach 3.0. Lame. Table 7 shows that maybe 20% of total CPU time in o/s primitives. Is this a lot? Maybe not. Do they show that O/S trends are reducing performance? O/S using traps &c more (microkernels). CPUs making traps &c relatively more expensive. Can't continue both trends indefinitely. That was 1991. Which trend won? What did we learn from this paper? Lots of performance details. Choice of O/S and CPU abstractions matters. System-level view: combined CPU-O/S-application behavior Was it a good paper? Clearly written? Clear statement of goals/problem/method/ideas? Does performance matter? When does performance matter? Google runs service on 10,000 PCs...