6.824 2002 Lecture 9: Scheduling What is scheduling? The way the O/S allocates resources to entities. Make sure most important activity gets the resources it needs. Very popular in the days of time sharing. Seemed irrelevant in era of PCs and workstations. Now back from the dead: Massive Internet servers with paying customers. Internet exposes them to international abuse and overload. Key problems: Gap between desired policy and available mechanism. Interaction between different schedulers. Resources you might want to schedule: CPU time. Physical memory. Disk and network I/O. I/O bus bandwidth. Entities that you might want to give resources to: Users. Processes. Threads. Web requests. MIT accounts. Many polices for resource -> entity allocation are possible: Strict priority. Divide equally. Shortest job first. Minimum guarantee + admission control. General plan for scheduling mechanisms 1. Understand where scheduling is occuring. 2. Expose scheduling decisions, allow control. 3. Account for resource consumption, to allow intelligent control. Simple example: Policy: equal CPU scheduling among processes. Switch in round robin every second? But this only works if processes are compute-bound. What if a process gives up some of its 10 ms to wait for input? Do we have to keep track of that and give it back? In practice you can't use simple round-robin. Track "recent" CPU use, e.g. over the last second. Always run process with least recent CPU use. Still, if you sleep long enough you lose. Pitfall: Priority Inversion Assume policy is strict priority. Thread T1: low priority. Thread T2: medium priority. Thread T3: high priority. T1: acquire(l) context switch to T3 T3: acquire(l)... must wait for T1 to release(l)... context switch to T2 T2 computes for a while T3 is indefinitely delayed despite high priority. Can solve if T3 lends its priority to holder of lock it is waiting for. So T1 runs, not T2. [this is really a multiple scheduler problem.] [since locks schedule access to locked resource.] Pitfall: Efficiency Efficiency often conflicts with fairness (or any other policy). Time quantum in CPU scheduling (vs low delay). Shortest seek vs FIFO disk scheduling. Contiguous read-ahead vs data needed now. Swap out my idle emacs to let gcc run faster w/ more phys mem. What happens when I type a key? These don't fit well into a "who gets to go next" scheduler framework. Inefficient scheduling may make *everybody* slower, including hi pri users. Pitfall: Multiple Interacting Schedulers Suppose you want your emacs to have priority over everything else. Give it high CPU priority. Does that mean nothing else will run if emacs wants to run? Disk scheduler might not know to favor emacs's disk I/Os. Typical UNIX disk scheduler favors disk efficiency, not process prio. Suppose emacs needs more memory. Other processes have dirty pages; emacs must wait. Does disk scheduler know these other processes' writes are high prio? Pitfall: Server Processes Suppose emacs uses X windows to display. The X server must serve requests from many clients. Does it know that emacs' requests should be given priority? Does the O/S know to raise X's priority when it is serving emacs? Similarly for DNS and NFS. Does the network know to give emacs' NFS requests priority? Pitfall: Hardware Schedulers The hardware decides which CPU gets to use memory next. The hardware decides which device gets to use I/O bus next. The hardware decides which interrupt is serviced next. But the hardware doesn't know about O/S scheduling policy. And the O/S may not realize that the hardware is scheduling. Scheduling is a system problem There are many schedulers; they interact. The CPU/thread/process scheduler is usually the easy part. The hardest part is system structure. I.e. the *existence* of interrupts is bad for scheduling. Conflicting goals may limit effectiveness. Case study: UNIX Goals: Simplicity (e.g. avoid complex locking regimes). Quick response to device interrupts. Favor interactive response. UNIX has a number of execution environments. We care about scheduling transitions among them. Some transitions aren't possible, some can'be be controlled. UNIX Execution Environments Process, user half Process, kernel half Soft interrupts: timer, network Device interrupts Hard timer interrupt [diagram] Rules: [lay out incrementally] User is pre-emptible. Kernel half is not pre-emptible. Effective priorities: intr > soft intr > kernel half > user UNIX: Process User Half Runs in process address space, on per-process stack. Interruptible. Pre-emptible: interrupt may cause context switch. We don't trust user processes to yield CPU. Voluntarily enters kernel half via system calls and faults. UNIX: Process Kernel Half Runs in kernel address space, on per-process kernel stack. Executes system calls and faults for its process. Interruptible (but can defer interrupts in critical sections). Not pre-emptible. Only yields voluntarily, when waiting for an event. E.g. disk I/O done. This simplifies concurrency control; locks often not required. No user process runs if any kernel half wants to run. Many process' kernel halfs may be sleeping in the kernel. UNIX: Device Interrupts Hardware asks CPU for an interrupt to ask for attention. Disk read/write completed, or network packet received. Runs in kernel space, on special interrupt stack. Interrupt routine cannot block; must return. Interrupts are interruptible. They nest on the one interrupt stack. Interrupts are not pre-emptible, and cannot really yield. The real-time clock is a device and interrupts every 10ms (or whatever). Process scheduling decisions can be made when interrupt returns. E.g. wake up the process waiting for this event. You want interrupt processing to be very fast, since it has priority. Don't do any more work than you have to. You're blocking processes and other interrupts. UNIX: Soft Interrupts Used when device handling is expensive. But no obvious process context in which to run. Examples: IP forwarding, TCP input processing. Runs in kernel space, on interrupt stack. Interruptable. Not pre-emptable, can't really yield. Triggered by hardware interrupt. Called when outermost hardware interrupt returns. Periodic scheduling decisions are made in timer s/w interrupt. Scheduled by hardware timer interrupt. I.e. current process has run long enough, switch. Is this good software structure? Let's talk about resource containers...