6.828 2011 Lecture 4: Process Creation big picture getting xv6 as far as the first process load kernel temporary page table real page table create process switch to process exec /init why do we care about VM? implements address spaces: force each process to only r/w its own memory (bugs, security) user code at predictable addresses big contiguous user address space where we were on Wednesday setting up a page table for xv6 kernel diagram of virtual address space 0x80000000 0x00000000 each process has its own page table plus one for when not running a process (e.g. early in boot) quick review of x86 page directory / page table [diagram: cr3, 1024 PDEs, 1024 page table pages] see last week's handout PTE: 22 bits phys addr, 12 flag bits translation: 10, 10, 12 we were early in main(), in kvmalloc(), after setupkvm() sheet 17 let's look at the page table (kpgdir) that setupkvm produces (gdb) break kvmalloc (gdb) next (gdb) print/x kpgdir[0] why is is zero? let's look up a virtual address how about the first instruction of kvmalloc (gdb) x/i kvmalloc 0x80107990 : push %ebp how would we translate 0x80107990 to a physical address? (gdb) print/x 0x80107990 >> 22 $4 = 0x200 (gdb) print/x kpgdir[0x200] $6 = 0x114007 Q: what is this? Q: what is the PPN? Q: what does the 7 mean? (gdb) print/x (0x80107990 >> 12) & 0xfff $6 = 0x107 (gdb) print/x ((int*)0x114000)[0x107] $12 = 0x107001 Q: what is this? Q: why 1 in the low bits? (gdb) print/x 0x107000 + 0x990 $13 = 0x107990 (gdb) x/i 0x107990 wait!!! why did the physical address work in gdb? back to kvmalloc it called setupkvm to create a page table now it calls switchkvm to start using it switchkvm loads kpgdir into %cr3 new page table much like the previous one maps more phys mem above the kernel does not have temporary mapping for low 4 MB and now 0x170990 won't work: (gdb) x/i 0x107990 0x107990: Cannot access memory at address 0x107990 next topic: physical memory allocation remember that setupkvm allocated phys mem for page directory how does memory allocation work? sheet 27 physical memory allocator interface allocates a page at a time va = kalloc() kfree(va) always allocates from phys mem above where kernel was loaded so pa is va - KERNBASE what does kernel use phys mem allocator for? page table pages kernel data structures (pipe buffers, stacks, &c) user memory how does the allocator work? data structure: a list of free physical pages allocation = remove first entry from list free = add page to head of list list's "next" pointers stored in first 4 bytes of each free page that memory is available since they are free allocator depends on phys pages having virtual addresses! since it must write them to manipulate free list just like VM code needed to write page table pages thus xv6 maps all phys pages into kernel address space which burns up a lot of virtual address space, limits max user mem other arrangements are possible where does allocator get initial pool of free physical memory? kinit called by main newend is first address beyond the end of the kernel as a virtual address memory beyond that is unused PGROUNDUP since newend may not be page-aligned Q: why must allocated pages have page-aligned addresses? assume phys mem goes up to PHYSTOP (lame) kfree each page usually called on previously-allocated memory kinit is abusing kfree a little bit kfree linked list note cast at 2767 2768 is where we depend on phys mem being mapped at a virt addr kalloc takes the first element of the free list Q: how to allocate mem for a data structure (e.g. array) > 4096 bytes? **** now let's talk about creating first process and its address space process execution states: diagram: user/kernel, process mem, kernel thread, kern stack, pagetable process might be executing in user space with cr3 pointing to its page table user mode, so can use PTE_U PTEs, < 0x80000000 or might be in a system call in the kernel e.g. open() finding a file on disk process's "kernel thread" kernel mode, so can use non-PTE_U PTEs using kernel stack or not currently executing xv6 has two kinds of transitions trap + return: user->kernel, kernel->user system calls, interrupts, divide-by-zero, &c hw+sw saves user registers on process's kernel stack save user process state ... run in kernel ... restore state process switch: between kernel threads one process is waiting for input, run another or time-slicing between compute-bound processes save p1's kernel-thread state ... restore p2's kernel-thread state Q: why per-process kernel stack? what would go wrong if syscall used a single global stack? how does xv6 store process state? struct proc sheet 20 kernel proc[] table has an entry for each process each field ... Q: you'd expect there to be an array or something of pointers to the process's user memory. where is it? how does xv6 know what memory a process is using? ordinarily p->pgdir and phys mem contents created by fork() we will fake them for first process ordinarily p->tf and p->context created by syscall/switch we will fake them for first process main calls userinit userinit sheet 22 only called for first process other processes created by fork mimics fork+exec create a normal-looking process ordinary scheduler will run it needs to fill in all struct proc entries allocproc sheet 21 used by both fork and userinit kernel stack setup: trapframe w/ "saved user registers" for us, initial user registers eax, eip, esp, &c trapret !!!!!!!!!!!!!!! context w/ "saved kernel thread registers" for us, initial kernel thread registers eip Q: where will new kernel thread start executing? doesn't set up trapframe b/c ordinarily copied from parent by fork, which calls allocproc but fork and userinit both always start thread in forkret trapframe sheet 06 context sheet 20 kernel stack diagram: top -> esp, ss eip, cs ... gs fs es ds p->tf -> edi & 7 other registers --- trapret --- eip = forkret ebp ebx esi p->context -> edi --- p->kstack -> ... Q: any guesses why there are *two* saved EIPs? back to userinit we know setupkvm -- only fills in kernel mappings this is a new page table for the new process not using it yet, will switch when new kernel thread starts call inituvm w/ ptr to new process's user instructions initcode sheet 75 user program exec("/init", args) inituvm sheet 17 we know kalloc and mappages(pgdir, va, sz, pa) initcode is tiny, fits in one page diagram: new mapping Q: new page is mapped at va=0; could inituvm call memmove(0, init, sz)? back to userinit sheet 22 tf->esp -- user stack at top of page tf->esp=0 -- first instruction at bottom of page main calls scheduler() scheduler sheet 24 no longer initialization: kernel now fully running whenever process gives up CPU -> scheduler so kernel runs scheduler a lot look for a process that wants to run, run it p->state: SLEEPING, RUNNABLE, RUNNING scheduler looks for RUNNABLE switchuvm sheet 17 tell h/w to use p->stack if re-enters kernel sys call or interrupt load %cr3 let's watch switch to new process's page table: (gdb) break switchuvm (gdb) x/5i 0 0x0: Cannot access memory at address 0x0 next past load %cr3 (gdb) x/5i 0 same as initcode sheet 75 but we are still in the kernel, in scheduler back to scheduler sheet 24 mark RUNNING so no other CPU runs it now switch to new process's kernel stack, registers, EIP swtch(place to save current ESP, previously saved ESP to switch to) swtch sheet 26 save current thread's registers and stack load new threads stack and registers saves current registers on current stack, struct context, sheet 20 expects new thread's stack to have registers in that format stack diagram: eip ***** ebp ebx esi edi Q: why these registers? callee saved, might have caller's live variables same format as struct context let's watch: (gdb) break swtch si until esp switch... (gdb) x/6x $esp si past esp switch (gdb) x/6x $esp after: 4 regs, forkret, trapret step into forkret sheet 24 just returns allocproc set up stack to have it return to trapret watch out: release and initlog cause interrupts so hack source to set first=0 and pushcli si for iret &c -- si leaves interrupts off next into trapret look at trapframe sheet 06 (gdb) x/19x $esp 0x0 0x23 are eip:cs 0x1000 0x2b are esp:ss trapret sheet 29 pops trapret registers from stack, mostly zero popal pops 8 general-purpose registers iret pops ESP, EIP, clears supervisor flag x/5x $esp now we are executing at address 0x0 in initcode sheet 75 what does initcode do? traps back into the kernel to make exec() system call