Memory Management What are we talking about? Finite amount of fast RAM memory. Large slow disk. View the memory as a cache for the disk. Or the disk as a place to store overflow from the memory. How to decide what to keep in memory and what to leave on disk? What is the point? Programs tend to use the same files/memory over and over. Commonly used files: /etc/passwd, /etc/resolv.conf, &c Programs tend to loop over instruction memory. We want to keep commonly used data in memory, to avoid disk I/O. Why not just buy lots of RAM and cache everything? Programmers know RAM is cheap, so apps are big. Desktop users like to run many apps. Some data sets are too huge. Memory for large machines (servers) is *not* cheap. And server load is unpredictable and can grow quickly. Penalty for error is enormous -- disk is 50,000 times slower than RAM. We want to build systems whose performance: 1. Degrades gracefully if there isn't enough memory. 2. Can take good advantage of any added memory. Picture map structure, shadow obj, file obj phys pages attached to objects arranged as a per-object cache perhaps a list or hash table of pages with object-relative addresses (i.e. page 7 of file "foo") Ordinary access: page fault finds map struct, obj, cached page within obj pmap_enter puts the phys page into TLB read() or write() system calls copy between user space and obj page note that read/write and memory refs access the same underlying objects What if the desired page isn't cached in the object? Find a free physical page. Read data from the disk. How do you find a free physical page of memory? Maybe you are lucky and a process just exited, or a file was just removed. Usually 100% of pages are in use - why let any lie idle? What you want is page that's going to be referenced farthest in future. This is the OPT (optimal) strategy. Hard to predict. Badly defined if running concurrent applications. Good strategy from 6.033: re-use the least-recently-used page. Let's assume for the moment that LRU is really what we want. How can we implement LRU? LRU is easy: O/S maintains a linked list of physical pages. Includes user program memory and disk cache. Does not include "wired" kernel pages. When a page is referenced, move to tail of list. When a page is needed, remove from head. Why is this awkward to implement? Works fine for read(), where O/S knows every access. Used in systems where disk block cache separate from VM cache. O/S doesn't see every memory access. And hardware doesn't maintain the LRU linked list for us. What does the O/S know about memory references? "Referenced" bit in each page's PTE. Can simulate by marking PTE invalid and waiting for fault. O/S can't see references in real time. Can't even scan every PTE in a reasonable amount of time! Can scan PTEs incrementally. Learns if a page has/hasn't been referenced since last scan. So it's easy to put pages in two buckets: Used recently, not used recently. This is pretty crude compared to a full LRU sort. Can we do better? Pageout daemon lists Draw pictures... Wired: kernel pages. Active: pages known to have been used recently, rough LRU order. Inactive: pages suspected of not being in used, rought LRU order. Free: pages that certainly haven't been used for a while. These pages can be used immediately by vm_fault(). O/S makes sure there are always free pages. So that interrupts don't have to wait for memory. And to avoid deadlocks: Freeing a page may *require* memory if it's dirty. Pageout daemon algorithm: while(1){ while(inactive list is too short){ pop a page from the head of the active list. If reference bit set, move to tail of active list. Otherwise, move to tail of inactive list. } while(free list is too short){ pop a page from the head of the inactive list. If reference bit set, move to tail of active list. Otherwise, If dirty, queue write and move to tail of active list. Otherwise, move to tail of free list. And remove from relevant vm object. } sleep for a while, or until vm_page_alloc sees free list too short. } vm_page_alloc() { if(free list is short) wake up pageout daemon. take a page from the free list. } Why is the pageout daemon idea good? We're only interested in pages that *haven't* been used lately. So we don't want to be driven by reference bits. Don't want move-to-front on a single big list. Don't want to have to scan all PTEs. Inactive list allows us to focus on pages likely not to be used recently. Though we do have to move some pages back to active list. Much better than scanning lists in vm_page_alloc(). One scan of inactive list finds many free pages. If done in vm_page_alloc(), one scan per freed page. Allows us to batch writes of dirty blocks. Hopefully can write multiple blocks per disk seek. What happens if we get the replacement policy wrong? That is, we evict (re-use) pages that will be needed soon? Machine will spend all its time waiting for page-ins. Each page-in likely to force page-out of valuable data. Little time spent actually computing. So programs will proceed at speed of disk, not memory. Think of a case in which LRU is the worst possible policy. Reading a file larger than memory. Or scanning through a virtual address range larger than phys mem. Garbage collector... Will evict all other pages -- your X server, for example. Re-reading the same file is just as bad. Memory is full of that file's cached pages. But each will be evicted before it can be re-used.