6.1810 2023 Lecture 4: Virtual Memory/Page tables * plan: address spaces paging hardware xv6 VM code ## Virtual memory overview * today's problem: [user/kernel diagram] suppose the shell has a bug: sometimes it writes to a random memory address [physical memory, 0..2^64 : apps and kernel in same memory] how can we keep it from wrecking the kernel? and from wrecking other processes? * we want isolated address spaces each process has its own memory it can read and write its own memory it cannot read or write anything else challenge: how to multiplex several address spaces onto one physical memory? while maintaining isolation * xv6 uses RISC-V's paging hardware to implement AS's ask questions! this material is important but complex topic of Thursday's lab (and shows up in several other labs) * page tables provide a level of indirection for addressing CPU -> MMU -> RAM VA PA s/w can only ld/st to virtual addresses, not physical kernel tells MMU how to map each virtual address to a physical address MMU essentially has a table, indexed by va, yielding pa called a "page table" va | pa ------- x | y code can only use addresses that have mappings in the table * we want a different address space for each process so we need more than one page table -- and need to switch MMU has a register (satp) that kernel writes to change page table * where does the page table live? in memory satp holds the (physical) address of the current page table MMU loads page table entries from memory kernel can modify page table by writing it in memory * how big is a page table? there are 2^64 distinct virtual addresses possible not practical to have a table with 2^64 entries! many of details are about reducing size * RISC-V maps 4-KB "pages" so page table only needs to have an entry per page 4 KB = 12 bits RISC-V has 64-bit addresses thus page table index is top 64-12 = 52 bits of VA except that the top 25 of the top 52 are unused no RISC-V has that much memory now can grow in future so, index is 27 bits. * Figure 3.1 -- simplified view MMU uses index bits of VA to find a page table entry (PTE) MMU constructs physical address using PPN from PTE + offset of VA * what is in a PTE (page table entry)? pte.pdf [10 reserved | 44 PPN | 10 flags} each PTE is 64 bits, but only 54 are used 44-bit PPN (physical page number) is top bits of 56-bit phys addr low 10 bits of PTE are flags Valid, Writeable, &c again, low 12 bits of physical address are copied from virtual address * would it be reasonable for page table to just be an array of PTEs? as in Figure 3.1 directly indexed by 27 index bits of virtual addresses? how big would page table be? 2^27 is roughly 134 million 64 bits per entry 134*8 MB for a full page table roughly 1GB per page table one page table per address space -- per process would waste lots of memory for small programs! you might only need mappings for a small fraction of possible pages so the rest of the entries would consume RAM but not be needed * RISC-V 64 uses a "three-level page table" to save space Figure 3.2 high 9 bits of va index into level-one page directory PTE from level one has phys addr of level-two page directory 2nd 9 bits indexes level-two directory same for 3rd 9 bits now we have PTE for page with desired memory it's really a tree: [diagram] descended 9 bits at a time * why does tree-shaped page table save space? * why 9 bits? 9 bits determines the size of a page directory 9 bits -> 512 PTEs -> 64 bits / PTE -> 4096 bytes, or one page that is, 9 bits means a directory fits on a single page * flags in PTE V, R, W, X, U * what if V bit not set? or store and W bit not set? "page fault" forces transfer to kernel trap.c in xv6 source xv6 kernel just prints error, kills process "usertrap(): unexpected scause ... pid=... sepc=... stval=..." kernel could install a PTE, resume the process e.g. after loading the page of memory from disk lots of tricks possible here; we'll see some ## Virtual memory in xv6 * kernel page table Figure 3.3 left side is virtual right is physical * what is a physical address layout? usually defined by hardware -- the board RAM and memory-mapped device registers * for us, qemu simulates the board and thus the physical address layout https://github.com/qemu/qemu/blob/master/hw/riscv/virt.c vi +60 virt.c MROM, UART, VIRTIO, DRAM same as the right-hand side of Figure 3.3 * the left side of Figure 3.3 is defined by the kernel's page table which the kernel sets up while booting mostly "direct mapping" allows kernel to use physical address as virtual address very convenient! note no W bit for kernel text note no X bit for kernel data &c xv6 assumes 128 MB of RAM -- PHYSTOP = 0x88000000 ought to find RAM size dynamically! way up at the top: trampoline, kernel stacks note the high pages have *two* virtual mappings! kernel executes in trampoline when switching page tables creates user page tables with identical trampoline at same va * could we run the kernel without paging? turn off the MMU? it's often possible to do that (depends on CPU design) why page the kernel? put RAM where expected double mappings forbid some accesses to catch bugs * each process has its own address space kernel makes a separate page table per process Figure 3.4 kernel switches page tables (i.e. sets satp) when switching processes different processes have similar virtual address layouts but page tables map to different physical addresses in RAM * why this user address space arrangement? user virtual addresses start at zero predictable, easier for compiler to generate code contiguous addresses -- good for e.g. big arrays but needn't have contiguous phys mem -- no fragmentation problem lots of address range in which to grow both kernel and user map trampoline page eases transition user -> kernel and back but U bit not set how does kernel use user virtual addresses, e.g. passed to read()? kernel software must translate to kernel virtual address consulting that process's page table ## Code walk through * setup of kernel address space paging is not enabled when kernel starts, so addresses are physical kernel is compiled/link to start at 0x80000000, where there's RAM kernel must first create its own page table kvmmake() in vm.c building Figure 3.3 UART0 at pa=0x10000000, want to direct-map at va=0x10000000 kvmmap() adds PTEs to a page table under construction we're not using it yet, it's just data in memory * let's vmprint() resulting page table (you'll write vmprint()) [draw tree, note first PTE's PPN indicates 2nd-level location, &c] the page directory pages came from kalloc() sequential due to the was kinit() worked does the 0..128..0 correspond to va=0x10000000? what VA would use that page table entry? [ L2=0 | L1=128 | L0=0 | offset=0 ] (gdb) print/x 128 << (9+12) $3 = 0x10000000 so va is 128 << (12+9) = 0x10000000, as expected * does UART's last-level PTE refer to the expected physical address? (gdb) print/x (0x10000000 >> 12) << 10 $2 = 0x4000000 what's the 7 in the low bits of the PTE? * what about with two pages mapped? move the vmprint to after VIRTIO0 is mapped [add to tree] * the full kernel page table? it's too big to print this way you can ask qemu for the page table that's in satp ^a c info mem note UART, RAM, trampoline at very top ^a c (to resume) * kvmmap() calls mappages() in vm.c arguments are root PD, va, size, pa, perm adds mappings from a range of va's to corresponding pa's for each page in the range calls walk to find address of PTE need the PTE's address (not just content) b/c we want to modify walk will create page directory pages if it needs to put the desired pa into the PTE mark PTE as valid w/ PTE_V * walk() in vm.c walk() mimics how the paging h/w finds the PTE for an address descends the three levels, through three directory pages PX(level, va) extracts the 9 bits at Level level the C type of pagetable is a 512-entry array of 64-bit integers (PTEs) so &pagetable[PX(level, va)] is the address of the PTE we want except at last level: if PTE_V the relevant page-table page already exists PTE2PA extracts the PPN from the PTE as a PA which the kernel can use (as a va) to read the next level if not PTE_V kalloc() a fresh page-table page fill in pte with PPN (using PA2PTE) and mark it PTE_V now the PTE we want is in the page-table page * user page table for sh built in exec.c's exec() note it's different from kernel -- e.g. no UART &c text, data, guard, stack tree view * how does an application allocate more memory for its heap? sbrk(n) -- a system call sysproc.c sys_sbrk() proc.c growproc() vm.c uvmalloc() -- for each new page kalloc() to get pa of RAM for user heap mappages() it's modifying the process's user page table which is not active at the moment, since we're in the kernel * next lecture: TAs will give advice about C and debugging