6.1810 2024 Lecture 19: Kernels and HLL

Reading: "RedLeaf: Isolation and Communication in Safe Operating System" (2020)

Outline:
  Commodity kernels written in C
    developer has total control but hard to get right
  Attractive alternative: Rust (much control, no GC)
  This paper: use HLL to design OS wo. hardware isolation
    all code runs in supervisor mode but w. language isolation
    
Challenges with C
  Memory management left to programmer
  Serious problems for kernel developers:
    Concurrent data structures is challenging (RCU, next week)
    Memory safety bugs
      Use-after-free (notoriously difficult to debug)
      Buffer overflows (security vulnerabilities)
  Security impact and benefits of HLL:
    https://security.googleblog.com/2024/03/secure-by-design-googles-perspective-on.html
    https://security.googleblog.com/2019/05/queue-hardening-enhancements.html
    https://security.googleblog.com/2024/10/safer-with-google-advancing-memory.html

High-level languages (HLL)
  HLL: automatic memory management
  Avoid large classes of C bugs
    HLLs enforces types, check bounds
  Many HLL have garbage collector
    GC automates memory deallocation
    Nice for concurrent programing
  But GC has costs
    CPU cycles at runtime
    Delays execution
    Extra memory
    https://www.usenix.org/conference/osdi18/presentation/cutler

Rust: HLL without GC
  developer must follow ownership rule
    only one unique pointer to each live object
    https://doc.rust-lang.org/book/ch04-00-understanding-ownership.html
  allows static analysis of lifetimes of objects: no GC
  small runtime
    bounds checking for memory safety

Rust and Linux 
 replace C with Rust
 challenging:
   C/Rust interaction
     pointers into C data structures
   unsafe Rust
 https://www.usenix.org/publications/loginonline/empirical-study-rust-linux-success-dissatisfaction-and-compromise

Reminder: hardware isolation
  process: isolation enforced by page tables
  xv6/monolithic: address spaces for kernel and user processes
  l4/microkernel: splits kernel into processes

RedLeaf: use HLL for isolation
  domain is the isolation unit
  isolation is enforced by HLL
    light-weight (no system calls)
    one domain calls a function in another domain
  rely on language to enforce isolation between domains
    cannot jump to arbitrary locations
    cannot cast arbitrarily
  why useful?
    performance
      fast cross-domain calls (e.g., compared to L4)
      direct-access to hardware/drivers
    more isolation
      e.g., factor kernel into many domains
  RedLeaf applications must be written in Rust
    cannot run arbitrary binaries
  long history: Lisp OSes, .., Singularity,...
  
Challenge: domain cleanup
  domains crashes (panic, runtime violation)
  how to free its resources?
    another domain may have a pointer into crashed domain
    how do threads of another domain that called into the crashed domain return?
    how to free "kernel" resources associated with domain
  nice property of unix processes
    data is private to process
      no external pointers into the process
    killable
    exit frees up all resources
  
Strawman design: pass data by value
  no external pointers into the crashed domain
  But large objects are expensive to pass
  goal: zero-copy communication
    
RedLeaf ideas:
  heap isolation
    no outside pointers into private
  shared heap w. exchangeable types
    allows for zero-copy
  ownership tracking
    to deallaocate objects in shared heap
  interface validation
    enforce types are exchangeable
  cross-domain call proxying
    update ownership
    handle crashes
  git clone https://github.com/mars-research/redleaf.git

RedLeaf inter-domain communication (fig 2)
 shared heap contains RRef<T>
   domain owner, refcnt, T info
 domain's heap: two-level memory allocation
   trusted crate for Box<T> allocations
 exchangeable types
   RRef<T> can point to other RRef<T>
   IDL compiler checks interface definitions
 trusted proxy for isolation
   update RRef<T> ownership (only the root RRef<T>)
   returns error when callee crashes

Domain cleanup
 private heaps are private
   no other domain has a pointer to an object in private heap
 shared heap deallocation
   find domain's RRef<T> roots
     find drop() for T
     call drop(), which may deallocate children
 crashed callee may scribble on mutable reference
   proxy returns RpcResult<T>
 for transparent recovery: references must be immutable
   
RedLeaf implementation (fig 1)
  Everything runs in supervisor mode
  many domains
    user processes in their own domain
    rv6 split in several domains
      xv6 in Rust
      core: "syscall" dispatcher
    drivers in their own domain
  microkernel

Creating and downloading domains
  challenge: types across domains must be identical
  solution:
    trusted compilation signs domain "binary"
      IDL files and compiler flags
      no Rust unsafe
      generated domain entry point
    microkernel checks signature
      (and thus all types)

Performance
  microbenchmark (table 1)
    why is L4 slower?
    why is RedLeaf only 124 cycles?
  language tax (fig 5)
    high-order functions
    Option<T>
  device driver (fig 8)
    why is redleaf-driver < DPDK?
    wh is redleaf-domain < redleaf-driver?s
  
HLL vs. hardware isolation
  RedLeaf TCB: Rust compiler, Rust core libraries, microkernel, IDL compiler
  xv6 TCB: RISC-V CPU, xv6-kernel (but less on compiler)
  language-level isolation in browser (e.g., WASM)
  page tables are useful for other purposes than isolation