6.828 2011 Lecture 4: Process Creation

big picture
  getting xv6 as far as the first process
  load kernel
  temporary page table
  real page table
  create process
  switch to process
  exec /init

why do we care about VM?
  implements address spaces:
    force each process to only r/w its own memory (bugs, security)
    user code at predictable addresses
    big contiguous user address space

where we were on Wednesday
  setting up a page table for xv6 kernel
  diagram of virtual address space
     0x80000000
     0x00000000   
  each process has its own page table
  plus one for when not running a process (e.g. early in boot)

quick review of x86 page directory / page table
  [diagram: cr3, 1024 PDEs, 1024 page table pages]
  see last week's handout
  PTE: 22 bits phys addr, 12 flag bits
  translation: 10, 10, 12

we were early in main(), in kvmalloc(), after setupkvm()
  sheet 17

let's look at the page table (kpgdir) that setupkvm produces
(gdb) break kvmalloc
(gdb) next
(gdb) print/x kpgdir[0]
why is is zero?

let's look up a virtual address
  how about the first instruction of kvmalloc
  (gdb) x/i kvmalloc
  0x80107990 <kvmalloc>:  push   %ebp
  how would we translate 0x80107990 to a physical address?

(gdb) print/x 0x80107990 >> 22
$4 = 0x200
(gdb) print/x kpgdir[0x200]
$6 = 0x114007
  Q: what is this?
  Q: what is the PPN?
  Q: what does the 7 mean?
(gdb) print/x (0x80107990 >> 12) & 0xfff
$6 = 0x107
(gdb) print/x ((int*)0x114000)[0x107]
$12 = 0x107001
  Q: what is this?
  Q: why 1 in the low bits?
(gdb) print/x 0x107000 + 0x990
$13 = 0x107990
(gdb) x/i 0x107990

wait!!! why did the physical address work in gdb?

back to kvmalloc
  it called setupkvm to create a page table
  now it calls switchkvm to start using it
  switchkvm loads kpgdir into %cr3

new page table
  much like the previous one
  maps more phys mem above the kernel
  does not have temporary mapping for low 4 MB
  and now 0x170990 won't work:
    (gdb) x/i 0x107990
    0x107990:       Cannot access memory at address 0x107990

next topic: physical memory allocation
  remember that setupkvm allocated phys mem for page directory
  how does memory allocation work?
  sheet 27

physical memory allocator interface
  allocates a page at a time
  va = kalloc()
  kfree(va)
  always allocates from phys mem above where kernel was loaded
  so pa is va - KERNBASE

what does kernel use phys mem allocator for?
  page table pages
  kernel data structures (pipe buffers, stacks, &c)
  user memory

how does the allocator work?
  data structure: a list of free physical pages
  allocation = remove first entry from list
  free = add page to head of list
  list's "next" pointers stored in first 4 bytes of each free page
    that memory is available since they are free

allocator depends on phys pages having virtual addresses!
  since it must write them to manipulate free list
  just like VM code needed to write page table pages
  thus xv6 maps all phys pages into kernel address space
    which burns up a lot of virtual address space, limits max user mem
  other arrangements are possible

where does allocator get initial pool of free physical memory?

kinit
  called by main
  newend is first address beyond the end of the kernel
    as a virtual address
    memory beyond that is unused
  PGROUNDUP since newend may not be page-aligned
  Q: why must allocated pages have page-aligned addresses?
  assume phys mem goes up to PHYSTOP (lame)
  kfree each page
    usually called on previously-allocated memory
    kinit is abusing kfree a little bit

kfree
  linked list
  note cast at 2767
  2768 is where we depend on phys mem being mapped at a virt addr

kalloc
  takes the first element of the free list

Q: how to allocate mem for a data structure (e.g. array) > 4096 bytes?

****

now let's talk about creating first process and its address space

process execution states:
  diagram: user/kernel, process mem, kernel thread, kern stack, pagetable
  process might be executing in user space
    with cr3 pointing to its page table
    user mode, so can use PTE_U PTEs, < 0x80000000
  or might be in a system call in the kernel
    e.g. open() finding a file on disk
    process's "kernel thread"
    kernel mode, so can use non-PTE_U PTEs
    using kernel stack
  or not currently executing

xv6 has two kinds of transitions
  trap + return: user->kernel, kernel->user
    system calls, interrupts, divide-by-zero, &c
    hw+sw saves user registers on process's kernel stack
    save user process state ... run in kernel ... restore state
  process switch: between kernel threads
    one process is waiting for input, run another
      or time-slicing between compute-bound processes
    save p1's kernel-thread state ... restore p2's kernel-thread state

Q: why per-process kernel stack?
   what would go wrong if syscall used a single global stack?

how does xv6 store process state?
  struct proc sheet 20
  kernel proc[] table has an entry for each process
  each field ...

Q: you'd expect there to be an array or something of pointers
   to the process's user memory. where is it? how does xv6
   know what memory a process is using?

ordinarily p->pgdir and phys mem contents created by fork()
  we will fake them for first process

ordinarily p->tf and p->context created by syscall/switch
  we will fake them for first process

main calls userinit

userinit sheet 22
  only called for first process
    other processes created by fork
  mimics fork+exec
    create a normal-looking process
    ordinary scheduler will run it
  needs to fill in all struct proc entries
  
allocproc sheet 21
  used by both fork and userinit
  kernel stack setup:
    trapframe w/ "saved user registers"
      for us, initial user registers
      eax, eip, esp, &c
    trapret !!!!!!!!!!!!!!!
    context w/ "saved kernel thread registers"
      for us, initial kernel thread registers
      eip
  Q: where will new kernel thread start executing?
  doesn't set up trapframe b/c ordinarily copied
    from parent by fork, which calls allocproc
  but fork and userinit both always start thread in forkret

trapframe sheet 06
context sheet 20

kernel stack diagram:
  top ->
                esp, ss
                eip, cs
                ...
                gs fs es ds
  p->tf ->      edi & 7 other registers
                ---
                trapret
                ---
                eip = forkret
                ebp
                ebx
                esi
  p->context -> edi
                ---
  p->kstack ->  ...

Q: any guesses why there are *two* saved EIPs?

back to userinit
  we know setupkvm -- only fills in kernel mappings
  this is a new page table for the new process
    not using it yet, will switch when new kernel thread starts
  call inituvm w/ ptr to new process's user instructions

initcode sheet 75
  user program
  exec("/init", args)

inituvm sheet 17
  we know kalloc and mappages(pgdir, va, sz, pa)
  initcode is tiny, fits in one page
  diagram: new mapping

Q: new page is mapped at va=0; could inituvm call memmove(0, init, sz)?

back to userinit sheet 22
  tf->esp -- user stack at top of page
  tf->esp=0 -- first instruction at bottom of page

main calls scheduler()

scheduler sheet 24
  no longer initialization: kernel now fully running
  whenever process gives up CPU -> scheduler
  so kernel runs scheduler a lot
  look for a process that wants to run, run it
  p->state: SLEEPING, RUNNABLE, RUNNING
  scheduler looks for RUNNABLE

switchuvm sheet 17
  tell h/w to use p->stack if re-enters kernel
    sys call or interrupt
  load %cr3

let's watch switch to new process's page table:
  (gdb) break switchuvm
  (gdb) x/5i 0
  0x0:    Cannot access memory at address 0x0
  next past load %cr3
  (gdb) x/5i 0
  same as initcode sheet 75
  but we are still in the kernel, in scheduler

back to scheduler sheet 24
  mark RUNNING so no other CPU runs it
  now switch to new process's kernel stack, registers, EIP
  swtch(place to save current ESP, previously saved ESP to switch to)

swtch sheet 26
  save current thread's registers and stack
  load new threads stack and registers
  saves current registers on current stack, struct context, sheet 20
  expects new thread's stack to have registers in that format
  stack diagram:
    eip *****
    ebp
    ebx
    esi
    edi
  Q: why these registers?
    callee saved, might have caller's live variables
  same format as struct context

let's watch:
  (gdb) break swtch
  si until esp switch...
  (gdb) x/6x $esp
  si past esp switch
  (gdb) x/6x $esp
  after: 4 regs, forkret, trapret

step into forkret sheet 24
  just returns
  allocproc set up stack to have it return to trapret
  watch out:
    release and initlog cause interrupts
    so hack source to set first=0 and pushcli
    si for iret &c -- si leaves interrupts off
  next into trapret

look at trapframe sheet 06
  (gdb) x/19x $esp
  0x0 0x23 are eip:cs
  0x1000 0x2b are esp:ss

trapret sheet 29
  pops trapret registers from stack, mostly zero
  popal pops 8 general-purpose registers
  iret pops ESP, EIP, clears supervisor flag
  x/5x $esp
  now we are executing at address 0x0 in initcode sheet 75

what does initcode do?
  traps back into the kernel to make exec() system call