6.824 2001 Lecture 10: Memory Management Many systems' performance depends on caching. Faster browsing with web proxy cache. O/S disk block cache in main memory. Virtual memory paging -- main memory as cache of pages on disk. In general, small cache memory, much larger supply of cacheable objects. Which objects should we cache? More precisely: Typically have little control of what must be put *in* cache. Driver by user/application demand. But we have to make room for new objects. So we have to get rid of existing cached objects. We *do* have control of how to evict old objects. This is often called replacement, subject of this lecture. What is the point? Programs tend to use the same files/memory over and over. Commonly used files: /etc/passwd, /etc/resolv.conf, &c Programs tend to loop over instruction memory. We want to keep commonly used data in memory, to avoid disk I/O. Why not just buy lots of RAM and cache everything? Programmers know RAM is cheap, so apps are big. Desktop users like to run many apps. Some data sets are too huge. Memory for large machines (servers) is *not* cheap. And server load is unpredictable and can grow quickly. Penalty for error is enormous -- disk is 50,000 times slower than RAM. We want to build systems whose performance: 1. Degrades gracefully if there isn't enough memory. 2. Can take good advantage of any added memory. Picture map structure, shadow obj, file obj phys pages attached to objects arranged as a per-object cache perhaps a list or hash table of pages with object-relative addresses (i.e. page 7 of file "foo") Ordinary access: page fault finds map struct, obj, cached page within obj pmap_enter puts the phys page into TLB read() or write() system calls copy between user space and obj page note that read/write and memory refs access the same underlying objects What if the desired page isn't cached in the object? Find a free physical page. Read data from the disk. How do you find a free physical page of memory? Maybe you are lucky and a process just exited, or a file was just removed. Usually 100% of pages are in use - why let any lie idle? What you want is page that's going to be referenced farthest in future. This is the OPT (optimal) strategy. Hard to predict. Badly defined if running concurrent applications. Good strategy from 6.033: re-use the least-recently-used page. Let's assume for the moment that LRU is really what we want. How can we implement LRU? LRU is easy: O/S maintains a linked list of physical pages. Includes user program memory and disk cache. Does not include "wired" kernel pages. When a page is referenced, move to tail of list. When a page is needed, remove from head. Why is this awkward to implement? Works fine for read(), where O/S knows every access. Used in systems where disk block cache separate from VM cache. O/S doesn't see every memory access. And hardware doesn't maintain the LRU linked list for us. What does the O/S know about memory references? "Referenced" bit in each page's PTE. Can simulate by marking PTE invalid and waiting for fault. O/S can't see references in real time. Can't even scan every PTE in a reasonable amount of time! Can scan PTEs incrementally. Learns if a page has/hasn't been referenced since last scan. So it's easy to put pages in two buckets: Used recently, not used recently. This is pretty crude compared to a full LRU sort. Can we do better? Pageout daemon lists Draw pictures... Wired: kernel pages. Active: pages known to have been used recently, rough LRU order. Inactive: pages suspected of not being in used, rought LRU order. Free: pages that certainly haven't been used for a while. These pages can be used immediately by vm_fault(). They are always clean (i.e. any changes already written to disk). O/S makes sure there are always free pages. So that interrupts don't have to wait for memory. And to avoid deadlocks: Freeing a page may *require* memory if it's dirty. Pageout daemon algorithm: [l10.c] Why is the pageout daemon idea good? We're only interested in pages that *haven't* been used lately. So we don't want to be driven directly by page references. Don't want a single big list. Alloc would need to skip over dirty and active pages. Don't want to have to scan all PTEs. They aren't in any useful order. Inactive list allows us to focus on pages likely not to be used recently. Though we do have to move some pages back to active list. Much better than scanning lists in vm_page_alloc(). One scan of inactive list finds many free pages. If done in vm_page_alloc(), one scan per freed page. Allows us to batch writes of dirty blocks. Hopefully can write multiple blocks per disk seek. What happens if we get the replacement policy wrong? That is, we evict (re-use) pages that will be needed soon? Machine will spend all its time waiting for page-ins. Each page-in likely to force page-out of valuable data. Little time spent actually computing. So programs will proceed at speed of disk, not memory. Think of a case in which LRU is the worst possible policy. Reading a file larger than memory. Or scanning through a virtual address range larger than phys mem. Garbage collector... Will evict all other pages -- your X server, for example. Re-reading the same file is just as bad. Memory is full of that file's cached pages. But each will be evicted before it can be re-used.