Address spaces using segments

This lecture is about virtual memory, focusing on address spaces. It is the first lecture out of series of lectures that uses xv6 as a case study.

Address spaces

Recall the goals of the address spaces abstraction:
- Give each process a private memory area for code, data, stack.
- Prevent one process from reading/writing outside its address space.
- Allow sharing when needed.
Usually the implementation is split between the O/S and the hardware.
The O/S manages address spaces:
- Allocate physical memory for them (for creation, growth, deletion).
- Keep track of them when they are not executing.
- Switch between them (to switch processes).
- Configure the hardware.
The hardware performs address translation and protection:
- Translate user addresses to physical addresses.
- Detect and prevent attempts to use memory outside the address space.
- Allow cross-space transfers (system calls, interrupts).
Also:
- O/S has its own address space.
- O/S must be able to conveneiently read/write user memory.
Hardware support may or may not correspond well to what the O/S wants.
Two main approaches: segments and page tables. Paging has won: most O/S are designed for paging, most modern CPU designs support only paging. BUT x86 provides many features only via segmentation h/w (interrupts, protection), so we have to learn about x86 segments. Also xv6 uses segments, not paging.

Example hardware for address spaces: x86 segments

PC block diagram without virtual memory support:

CPU
physical memory (DRAM)
Physical address == what is on CPU's address pins

The x86 starts out in real mode and translation is as follows:

programs use 16-bit virtual (or logical) addresses
seg_reg*16 + va ==> physical address
physical addresses are 20 bits, so 1 MB RAM
no protection: program can load anything into seg reg

The operating system can switch the x86 to protected mode, which supports 32-bit virtual and physical addresses, and allows the O/S to set up address spaces so that user processes can't change them. Translation in protected mode is as follows:

selector:offset (virtual / logical addr)
==SEGMENTATION==>
linear address
==PAGING ==>
physical address

Next lecture covers paging; now we focus on segmentation.

Protected-mode segmentation works as follows (see handout):

segment register holds segment selector
selector: 13 bits of index, local vs global flag, 2-bit RPL
selector indexes into global descriptor table (GDT)
segment descriptor holds 32-bit base, limit, type, protection
la = va + base ; assert(va < limit);
choice of seg register usually implicit in instruction
- ESP uses SS, EIP uses CS, others (mostly) use DS
- some instructions can take far addresses:
  - ljmp $selector, $offset
GDT lives in memory, CPU's GDTR register points to base of GDT
LGDT instruction loads GDTR
you turn on protected mode by setting PE bit in CR0 register
What about protection?
- instructions can only r/w/x memory reachable through seg regs
- not before base, not after limit
- can my program change a segment register? yes, but...
- can my program re-load GDTR? no!
- how does h/w know if user or kernel?
- Current privilege level (CPL) is in the low 2 bits of CS
- CPL=0 is privileged O/S, CPL=3 is user
- why can't app modify the descriptors in the GDT? it's in memory...
- what about system calls? how do they transfer to kernel?
- app cannot just lower the CPL

Case study (xv6)

xv6 is a reimplementation of Unix 6th edition.

v6 is an early Unix operating system for DEC PDP11
- Thompson and Ritchie, 1976
- PDP11: 16 bit data and addresses, 18 bit physical addresses
- ancestor of Linux &c but much smaller
- recognizable: shell, multi-user, directories
- written in C
- 6.828 used to use it instead of xv6
- Unix papers.
xv6 written for 6.828:
- even smaller than v6, maybe not useable as is
- preserves basic structure (processes, files, pipes, &c)
- you don't have to learn PDP11 and x86
- runs on multi-processor PCs.

Newer Unixs have inherited many of the conceptual ideas even though they added paging, networking, graphics, improve performance, etc.

You will need to read most of the source code multiple times. Your goal is to explain every line to yourself.

Overview of address spaces in xv6

In today's lecture we see how xv6 creates the kernel address spaces, first user address spaces, and switches to it. To understand how this happens, we need to understand in detail the state on the stack too---this may be surprising, but a thread of control and address space are tightly bundled in xv6, in a concept called process. The kernel address space is the only address space with multiple threads of control. We will study context switching and process management in detail next weeks; creation of the first user process (init) will get you a first flavor.

xv6 uses only the segmentation hardware on the x86; it doesn't use paging. (In JOS you will use page-table hardware too, which we cover in next lecture.)

The kernel address space:

  the code segment runs from 0 to 2^32 and is mapped X and R
  the data segment runs from 0 to 2^32 but is mapped W (read and write).

Each process has an address space, laid out as follows starting at virtual address zero:
```
  text
  original data and bss
  fixed-size stack
  expandable heap
```
A process's code, data, and stack segments all map this virtual address space to the same range of linear addresses. That is, all three segments are the same.

The x86 designers probably had in mind more interesting uses of segments. What might they have been?

xv6 process structure

we're about to look at how the first XV6 process starts up
  it will run initcode.S, which does exec("/init")
  /init is a program that starts up a shell we can type to

what's the important state of an xv6 process?
  kernel proc[] table has an entry for each process
    p->mem points to user mem phys address
    p->kstack points to kern stack phys address
    struct context holds saved kernel registers
      EIP, ESP, EAX, &c
      for when a system call is waiting for input
  user half: user memory
    user process sees memory as starting at zero
    instructions, data, stack, expandable heap
  kernel half: kernel executing a system call for a process
    on the process's kernel stack

xv6 has two kinds of transitions
  trap + return: user->kernel, kernel->user
    system calls, interrupts, divide-by-zero, &c
    save user process state ... run in kernel ... restore state
  process switch: between kernel halves
    one process is waiting for input, run another
      or time-slicing between compute-bound processes
    save p1's kernel-half state ... restore p2's kernel-half state
  setting up first process involves manually initializing this state

saved state for trap
  during trap, the CPU:
    switches to process's kernel stack
    pushes SS, ESP, EFLAGS, CS, EIP onto kernel stack
    jumps into kernel
  kernel then pushes the other user registers
  this is struct trapframe
  trap return reverses this, resuming at saved user EIP
  for first process:
    manually set up these "saved" registers on the kernel stack
    EIP 0, ESP top of user memory, &c

saved state for process switch
  save registers (EIP, ESP, EAX, &c) in oldp->context
  restore registers from newp->context
  now we are at the EIP of newp, and using its kernel stack
  this is the only way xv6 switches among processes
    there is no direct user->user process switch
    instead, user TRAP kernel PROCESS-SWITCH kernel TRAP-RETURN user
  for first process:
    manually set up EIP and ESP to run forkret, which returns from trap

Since an xv6 process's address space is essentially a single segment, a process's physical memory must be contiguous. So xv6 may run into fragmentation if process sizes are a significant fraction of physical memory.

xv6 kernel address space

Let's see how xv6 creates the kernel address space by tracing xv6 from when it boots, focusing on address space management. You might want to turn of the second CPU for this sequence, by changing the line in .bochsrc that starts with cpu: count=2.

Start with bootasm.S, which the BIOS loads at 0x7c00 ( b 0x7c00, c, r )
0912: what segment is being used here?
0919: does the code really need to set up %ds etc?
0954: what values are loaded in the GDT?
- 0988: gdtr points to gdt
- 0984: entry 0 unused
- 0985: entry 1 (X + R, base = 0, limit = 0xffffffff, DPL = 0)
- 0986: entry 2 (W, base = 0, limit = 0xffffffff, DPL = 0)
- look at GDT after lgdt with info gdt
- how does address translation change right after lgdt completes?
0957: what is the immediate effect of setting CR0_PE_ON?
0961: far jump, load 8 in CS. why?
0967-0971: set up other segment registers
0974: where is the stack?
1116: bootmain in the bootloader (see lab 1), which calls main

1212: main ( b 0x102628 )

main prepares hardware and kernel data structures to run o/s.
where is the stack? (sp = 0x7bec)

what is on it? ( print-stack 6 )

   00007bec [00007bec]  7d7d  // return address in bootmain
   00007bf0 [00007bf0]  0080  // callee-saved ebx
   00007bf4 [00007bf4]  7369  // callee-saved esi
   00007bf8 [00007bf8]  0000  // callee-saved ebp
   00007bfc [00007bfc]  7c4a  // return address for bootmain: spin
   00007c00 [00007c00]  c031fcfa  // first instruction at 7c00 (start)

1235: main() calls userinit() to create first process
1263: then scheduler() to start running processes

creating the first process's address space

1762: userinit() calls copyproc(0) to set up kernel part of a new process
1762: copyproc() usually copies, for fork(), but not this time
1627: every process has a proc[] table entry, allocproc() finds free slot
1719: allocate stack to use while in kernel (sys call or interrupt)
1723: space on stack for trap frame later needed to get into user space
1725: (if fork(), would copy parent's user memory &c)
1746: where will new process run in kernel? where is its kernel stack?
1763: allocate memory to hold new process's user code, data, stack
1767: trap frame holds initial user register contents
1767: what does the DPL_USER do?
1771: why set FL_IF?
1772: where will the user stack be?
1776: where in memory is this writing?
1779: where will the user program start executing?
1780: what is being copied here? to what address? (sheet 66)
about to run scheduler(). the new process has:
- entry in proc[] array
- p->context indicates where to start in kernel
- p->kstack points to phys mem for kernel stack
- trapframe on kernel stack holds initial user registers
- p->mem points to phys mem with user code/data/stack
- we are missing user segment descriptors...

running the first process

remember that main calls scheduler() after userinit()
we'll see how scheduler() works in a few lectures, overview now
1815: look for a RUNNABLE process, finds ours immediately
1830: setupsegs(p) sets up the segment descriptors
1680: make interrupts use process's kernel stack
1684-1694: set up gdt
1685: why phys 0? why size 0x100000 + 64*1024?
1685: why does the process need kernel segment descriptors at all?
1690: is p->mem logical or physical?
1690: what will the user program's address 0 refer to?
1690: why DPL_USER (= 3) instead of previous 0?
1700: use this new gdt now! why does this work?
1700: what's in the segment table? ( b 0x103507 2nd time, info gdt )
1700: what's in the segment? ( u/8 0x20b000 )
back to scheduler()...
1832: switches to eip/esp in p->context (forkret, p->kstack)

1886: let's look at the stack in forkret() ( b 0x102f80, print-stack 17 )

   0020cfbc [0020cfbc]  0000
   0020cfc0 [0020cfc0]  0000
   0020cfc4 [0020cfc4]  0000
   0020cfc8 [0020cfc8]  0000
   0020cfcc [0020cfcc]  0000
   0020cfd0 [0020cfd0]  0000
   0020cfd4 [0020cfd4]  0000
   0020cfd8 [0020cfd8]  0000
   0020cfdc [0020cfdc]  0023
   0020cfe0 [0020cfe0]  0023
   0020cfe4 [0020cfe4]  0000
   0020cfe8 [0020cfe8]  0000
   0020cfec [0020cfec]  0000
   0020cff0 [0020cff0]  001b
   0020cff4 [0020cff4]  0200
   0020cff8 [0020cff8]  0ffc
   0020cffc [0020cffc]  0023

what are we looking at? what are 23, 1b, 200, ffc, and 23?
2484: let's look at forkret1 ( b 0x104ac8 )
2485: esp now points to trapframe ( print-stack 17 )
2475: "restore" user registers, es, ds
2479: what does the iret do? what's on the stack? ( print-stack 5 )
now where are we?

Managing physical memory

To create an address space we must allocate physical memory, which will be freed when an address space is deleted (e.g., when a user program terminates). xv6 implements a first-fit memory allocator (see kalloc.c, sheet 22).

kalloc() maintains a list of ranges of free memory. The allocator finds the first range that is larger than the amount of requested memory. It splits that range in two: one range of the size requested and one of the remainder. It returns the first range. When memory is freed, kfree will merge ranges that are adjacent in memory.

Under what scenarios is a first-fit memory allocator undesirable?

Growing an address space

How can a user process grow its address space? growproc.

1657: allocate a new segment of old size plus n
1660: copy the old segment into the new (ouch!)
1661: why bother zeroing the rest?
1662: free the old physical memory

We could do a lot better if segments didn't have to be contiguous in physical memory. How could we arrange that? Using page tables, which is our next topic. This is one place where page tables would be useful, but there are others too (e.g., in fork).