"Scheduler Activations...", Anderson, Bershad, Lazowska, Levy, SOSP 1991.

User-level threads package obviously extremely fast and flexible.

What's wrong with user-level threads?
  [picture: many uthreads and one vCPU]
  A blocking system call blocks all threads.
  Why not use select() to avoid blocking?
    disk read. open(). page fault.
  Hard to run as many threads as CPUs.
  
What's wrong with kernel threads?
  [picture: many uthreads, one vCPU for each]
  Handles blocking system calls well.
  10x-30x slower than user threads, due to kernel calls.
  Which operations have to call into the kernel?
    Thread creation?
    Thread context switch?
    Waiting for a held lock?
    Waiting for a free lock? (maybe not...)
    Releasing a lock?

Why do they keep emphasizing "wrong abstraction?"
  "Kernel threads are the wrong abstraction on which to support user mgmt of parallelism."
  What is being abstracted?
  Why not just talk about making your threads faster?

What about multiplexing user threads on kernel threads?
  This is what the paper mostly compares against.
  Viewing kthreads not as the feature, but as hidden machinery.
  Try to do most operations in user level: create, ctx switch, locks.
  Kernel can't know which is the right uthread (ie kthread) to run.
  Kernel may pre-empt during a critical section.
    Assuming user-level locks.  
  Kernel may not understand priorities of uthreads.
  Bad to have fewer runnable kthreads than CPUs:
    Wasting CPU time.
    This will happen when kthreads block in the kernel (page fault).
    So spawn a few extra kthreads?
  Bad to have more runnable kthreads than CPUs:
    Scheduling/priority and pre-empt in critical section.
    Also caused by multiple unrelated jobs competing for CPUs.
  Summary: kthread *not* a virtual CPU!
    Don't know when you lose or gain one...

Specific list of problems/goals?
  As fast as user-level for create, switch, lock.
  User-level scheduling decisions. 
  Regain control when a thread blocks.
  Know when a thread could unblock.
  Know when a thread's CPU is taken away.
  Negotiate about more/fewer CPUs.

Overview of scheduler activations solution?
  "Functionality of kernel threads with performance and flex of user-level."
  [picture: many uthreads, a few vCPUs]
  A few virtual CPUs.
  User scheduler takes care of most scheduling.
  How many vCPUs are reasonable?
    Real CPUs, blocked threads, other time-sharing jobs.
  Minimize u/k interaction -- what must they tell each other?
    Inter-job scheduling events.
    1. New vCPU (real CPU yielded by another job).
    2. Take away vCPU (need real CPU for a different job).
    3. Thread blocked.
    4. Thread un-blocked.

What's a scheduler activation?
  A virtual CPU, created by kernel.
  Kernel "upcalls" to give an activation to an address space.
  Address space uses it to run a thread.
  Upcall may also carry event notifications.
  An activation can block in kernel -- has a kernel stack.
  One activation per real CPU, plus one per blocked thread.
  The address space can *always* keep the upcall activation.

When does the kernel upcall?
  To give a physical CPU to an address space.
    Code:
      Call thread_schedule().
      Need to index current_thread &c by CPU #.
      And spin-lock around all_thread[] accesses.
  To take away a physical CPU.
    I.E. an activation has been pre-empted by another program.
    Includes pre-empted thread's registers &c.
    This upcall pre-empts one of the address space's CPUs.
      So it really announces *two* pre-emptions.
    Code:
      Save thread's PC.
      Mark thread as RUNNABLE (but not RUNNING).
      Call thread_schedule().
  To announce that a thread has blocked in the kernel.
    Due to system call or page fault.
    So program can get the CPU back.
    Code:
      Mark old thread as WAITING.
      Call thread_schedule().
    Blocked thread retains state in the kernel.
  To announce that a thread has unblocked in the kernel.
    I.e. kernel has finished the system call or page fault.
    Upcalls (w/ uthread state) rather than resuming user code.
    (This upcall pre-empts one of the address space's CPUs)
    Code:
      Save unblocked thread's PC.
      Mark unblocked thread as RUNNABLE.
      Call thread_schedule().

How does kernel notify process when it takes away its last CPU?

What does the user tell the kernel?
  Please give me another CPU.
    May result in an upcall in a new activation.
  Please take this CPU away from me; I can't use it.
  Please pre-empt particular CPU and give me another activation.
    Always results in an upcall.
    Why would you do this? A higher priority thread becomes runnable.

What if a uthread page-faults/is pre-empted while holding a lock?
  Pre-emption notification upcall completes critical section.
  But page-faulted threads only complete after unblocking.
    See Bershad's Fast Mutual Exclusion...

Is this just performance tuning?
  I.e. kernel threads are fine, just too slow?
    no: want user scheduling. user lock types.
  Or genuinely more useful O/S interface?
  I.e. better functionality?

Could scheduler activations help event-driven programming?
  I.e. how general is their new plan?
  System calls still block.
  So you still can't have non-blocking callbacks.

What are the evaluation results?
  Table 4: as fast as user threads!
  Figure 1: why is # of CPUs the most relevant x-axis?
  Why isn't that the end of the story?
    What if kernel interaction *is* required?
    Then the right comparison is with kernel threads.
  What does Figure 2 show?
    What is orig FastThrds? Topaz threads?
    SA faster than kernel threads due to fewer kernel crossings.
    orig FastThreads slows due to losing CPUs (kthreads) in page faults.
    why aren't Topaz threads affected by this? lots of them?
  Why does s.a. look so good in Table 5?    
    orig FastThreads: pre-empted lock holders.
    Topaz threads: operations generally slower.

Overall: how impressive are the performance improvements?

Would you want to use scheduler activations?

How's the paper?
  Statement of claims?
  Evaluation of functionality?
  Evaluation of performance?