(Protected-Mode) Address Translation on the x86
Reading: 80386 chapters 5 and 6
Handout: x86 address translation diagram -
PS -
EPS -
xfig
Why do we care about x86 address translation?
- It can simplify s/w structure by placing data at fixed known addresses.
- It can implement tricks like demand paging and copy-on-write.
- It can isolate programs to contain bugs.
- It can isolate programs to increase security.
- Labs use paging a lot, and segments more than you might think.
PC block diagram
- physical address
- base, IO hole, extended memory
- Physical address == what is on CPU's address pins
Translation
- real mode
- segment*16+offset ==> physical address
- no protection: program can load anything into seg reg
- protected mode
- selector:offset (logical addr)
==SEGMENTATION==>
- linear address
==PAGING ==>
- physical address
Protected-mode Segmentation
- protected-mode segments add 32-bit addresses and protection
- wait: what's the point? original point of segments was bigger addresses, but 32-bit mode fixes that!
- segment register holds segment selector
- selector indexes into global descriptor table (GDT)
- segment descriptor holds 32-bit base, limit, type, protection
- la = va + base ; assert(va < limit);
- seg register usually implicit in instruction
- DS:REG
- SS:ESP, SS:EBP
- pushl %ecx, pushl $_i
- popl %ecx
- movl 4(%ebp),%eax
- CS:EIP
- String instructions: read from DS:ESI, write to ES:EDI
- Exception: far addresses
- LGDT instruction loads CPU's GDT register
- you turn on protected mode by setting PE bit in CR0 register
- what happens with the next instruction? CS now has different meaning...
- what about protection?
- can o/s limit what memory an application can read or write?
- app can load any selector into a seg reg...
- but can only mention indices into GDT
- app can't change GDT register (requires privilege)
- why can't app write the descriptors in the GDT?
- what about system calls?
- need a way for app to jump into a segment and acquire privs
- but app can't be allowed to create such a segment, or write it
- current privilege level (CPL) is in the low 2 bits of CS
- CPL=0 is privileged O/S, CPL=3 is user
- CPL must be <= descriptor's DPL in order to read or write segment
- call gates and interrupts can change CPL and switch CS/SS segment
- but app cannot just lower the CPL
Why aren't protected-mode segments enough?
- Why did the 386 add paging as well?
- Isn't it enough to give each process its own segments?
Paging
- paging hardware maps linear address (la) to physical address (pa)
- (we will often interchange "linear" and "virtual")
- page size is 4096 bytes, so there are 1,048,576 pages in 2^32
- why not just have a big array with each page #'s translation?
- table[20-bit linear page #] => 20-bit phys page #
- 386 uses 2-level mapping structure
- one page directory page, with 1024 page directory entries (PDEs)
- up to 1024 page table pages, each with 1024 page table entries (PTEs)
- so la has 10 bits of directory index, 10 bits table index, 12 bits offset
- What's in a PDE or PTE?
- 20-bit phys page number, present, read/write, user/supervisor
- cr3 register holds physical address of current page directory
- puzzle: what do PDE read/write and user/supervisor flags mean?
- puzzle: can supervisor read/write user pages?
- Here's how the MMU translates an la to a pa:
uint
translate (uint la, bool user, bool write)
{
uint pde;
pde = read_mem (%CR3 + 4*(la >> 22));
access (pde, user, read);
pte = read_mem ( (pde & 0xfffff000) + 4*((la >> 12) & 0x3ff));
access (pte, user, read);
return (pte & 0xfffff000) + (la & 0xfff);
}
// check protection. pxe is a pte or pde.
// user is true if CPL==3
void
access (uint pxe, bool user, bool write)
{
if (!(pxe & PG_P)
=> page fault -- page not present
if (!(pxe & PG_U) && user)
=> page fault -- not access for user
if (write && !(pxe & PG_W))
if (user)
=> page fault -- not writable
else if (!(pxe & PG_U))
=> page fault -- not writable
else if (%CR0 & CR0_WP)
=> page fault -- not writable
}
- CPU's TLB caches vpn => ppn mappings
- if you change a PDE or PTE, you must flush the TLB!
- turn on paging by setting CR0_PE bit of %cr0
Can we use paging to limit what memory an app can read/write?
- user can't modify cr3 (requires privilege)
- is that enough?
- could user modify page tables? after all, they are in memory.
How we will use paging (and segments)
- use segments only to switch privilege level into/out of kernel
- use paging to structure process address space
- use paging to limit process memory access to its own address space
- below is the JOS virtual memory map
- why map both kernel and current process? why not 4GB for each?
- why is the kernel at the top?
- why map all of phys mem at the top? i.e. why multiple mappings?
- why map page table a second time at VPT?
- why map page table a third time at UVPT?
- how do we switch mappings for a different process?
4 Gig --------> +------------------------------+
| | RW/--
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
: . :
: . :
: . :
|~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~| RW/--
| | RW/--
| Remapped Physical Memory | RW/--
| | RW/--
KERNBASE -----> +------------------------------+ 0xf0000000
| Cur. Page Table (Kern. RW) | RW/-- PTSIZE
VPT,KSTACKTOP--> +------------------------------+ 0xefc00000 --+
| Kernel Stack | RW/-- KSTKSIZE |
| - - - - - - - - - - - - - - -| PTSIZE
| Invalid Memory | --/-- |
ULIM ------> +------------------------------+ 0xef800000 --+
| Cur. Page Table (User R-) | R-/R- PTSIZE
UVPT ----> +------------------------------+ 0xef400000
| RO PAGES | R-/R- PTSIZE
UPAGES ----> +------------------------------+ 0xef000000
| RO ENVS | R-/R- PTSIZE
UTOP,UENVS ------> +------------------------------+ 0xeec00000
UXSTACKTOP -/ | User Exception Stack | RW/RW PGSIZE
+------------------------------+ 0xeebff000
| Empty Memory | --/-- PGSIZE
USTACKTOP ---> +------------------------------+ 0xeebfe000
| Normal User Stack | RW/RW PGSIZE
+------------------------------+ 0xeebfd000
| |
| |
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
. .
. .
. .
|~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~|
| Program Data & Heap |
UTEXT --------> +------------------------------+ 0x00800000
PFTEMP -------> | Empty Memory | PTSIZE
| |
UTEMP --------> +------------------------------+ 0x00400000
| Empty Memory | PTSIZE
0 ------------> +------------------------------+