Lecture 5

Address translation and sharing using page tables

Reading: 80386 chapters 5 and 6

Handout: x86 address translation diagram - PDF - PS - EPS - xfig

Why do we care about x86 address translation?

It can simplify s/w structure: addresses in one process not constrained by what other processes might be running.
It can implement tricks like demand paging and copy-on-write.
It can isolate programs to contain bugs or increase security.
It can provide efficient sharing between processes.
JOS uses paging a lot, and segments more than you might think.

Why aren't protected-mode segments enough?

Why did the 386 add translation using page tables as well?
Isn't it enough to give each process its own segments?
Programming model, fragmentation
In practice, segments are little-used

Translation using page tables (on x86):

segmentation hardware first computes the linear address
in practice, most segments (e.g. in JOS, Linux) have base 0 and max limit, making the segmentation step a no-op.
paging hardware then maps linear address (la) to physical address (pa)
(we will often interchange "linear" and "virtual")
when paging is enabled, every instruction that accesses memory is subject to translation by paging
paging idea: break up memory into 4096-byte chunks called pages
independently control mapping for each page of linear address space
compare with segmentation (single base + limit): many more degrees of freedom
4096-byte pages means there are 2^20 = 1,048,576 pages in 2^32 bytes
conceptual model: array of 2^20 entries, called a page table, specifying the mapping for each linear page number
table[20-bit linear page #] => 20-bit phys page #
PTE entries: bottom of handout
20-bit phys page number, present, read/write, user/supervisor, etc
puzzle: can supervisor read/write user pages?
can use paging hardware for many purposes
- (seen some of this two lectures ago)
- flat memory
- segment-like protection: contiguous mappings
- solve fragmentation problems when allocating more memory (xv6-like process memory layout)
- demand-paging (%cr2 stores faulting address)
- copy-on-write
- sharing, direct access to devices (e.g. /dev/fb on linux)
- switching between processes
where is this table stored? back in memory.
in our conceptual model, CPU holds the physical address of the base of this table.
%cr3 serves this purpose on the x86 (with one more detail below)
for each memory access, access memory again to look up in table
why not just have a big array with each page #'s translation?
same problems that we were trying to solve with paging! (demand-paging, fragmentation)
so, apply the same trick
- we broke up our 2^32-byte memory into 4096-byte chunks and represented them in a 2^22-byte (2^20-entry) table
- now break up the 2^22-byte table into 4096-byte chunks too, and represent them in another 2^12-byte (2^10-entry) table
- just another level of indirection
- now all data structures are page-sized
386 uses 2-level mapping structure
one page directory page, with 1024 page directory entries (PDEs)
up to 1024 page table pages, each with 1024 page table entries (PTEs)
so la has 10 bits of directory index, 10 bits table index, 12 bits offset
%cr3 register holds physical address of current page directory
puzzle: what do PDE read/write and user/supervisor flags mean?
now, access memory twice more for every memory access: really expensive!
optimization: CPU's TLB caches vpn => ppn mappings
if you change any part of the page table, you must flush the TLB!
- by re-loading %cr3 (flushes everything)
- by executing invlpg va
turn on paging by setting CR0_PE bit of %cr0

Here's how the MMU translates an la to a pa:

   uint
   translate (uint la, bool user, bool write)
   {
     uint pde; 
     pde = read_mem (%CR3 + 4*(la >> 22));
     access (pde, user, write);
     pte = read_mem ( (pde & 0xfffff000) + 4*((la >> 12) & 0x3ff));
     access (pte, user, write);
     return (pte & 0xfffff000) + (la & 0xfff);
   }

   // check protection. pxe is a pte or pde.
   // user is true if CPL==3
   void
   access (uint pxe, bool user, bool write)
   {
     if (!(pxe & PG_P)  
        => page fault -- page not present
     if (!(pxe & PG_U) && user)
        => page fault -- not access for user
   
     if (write && !(pxe & PG_W)) {
       if (user)   
          => page fault -- not writable
       if (%CR0 & CR0_WP) 
          => page fault -- not writable
     }
   }

Can we use paging to limit what memory an app can read/write?

user can't modify cr3 (requires privilege)
is that enough?
could user modify page tables? after all, they are in memory.

How we will use paging (and segments) in JOS:

use segments only to switch privilege level into/out of kernel
use paging to structure process address space
use paging to limit process memory access to its own address space
below is the JOS virtual memory map
why map both kernel and current process? why not 4GB for each? how does this compare with xv6?
why is the kernel at the top?
why map all of phys mem at the top? i.e. why multiple mappings?
(will discuss UVPT in a moment...)
how do we switch mappings for a different process?

    4 Gig -------->  +------------------------------+
                     |                              | RW/--
                     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
                     :              .               :
                     :              .               :
                     :              .               :
                     |~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~| RW/--
                     |                              | RW/--
                     |   Remapped Physical Memory   | RW/--
                     |                              | RW/--
    KERNBASE ----->  +------------------------------+ 0xf0000000
                     |  Cur. Page Table (Kern. RW)  | RW/--  PTSIZE
    VPT,KSTACKTOP--> +------------------------------+ 0xefc00000      --+
                     |         Kernel Stack         | RW/--  KSTKSIZE   |
                     | - - - - - - - - - - - - - - -|                 PTSIZE
                     |      Invalid Memory          | --/--             |
    ULIM     ------> +------------------------------+ 0xef800000      --+
                     |  Cur. Page Table (User R-)   | R-/R-  PTSIZE
    UVPT      ---->  +------------------------------+ 0xef400000
                     |          RO PAGES            | R-/R-  PTSIZE
    UPAGES    ---->  +------------------------------+ 0xef000000
                     |           RO ENVS            | R-/R-  PTSIZE
 UTOP,UENVS ------>  +------------------------------+ 0xeec00000
 UXSTACKTOP -/       |     User Exception Stack     | RW/RW  PGSIZE
                     +------------------------------+ 0xeebff000
                     |       Empty Memory           | --/--  PGSIZE
    USTACKTOP  --->  +------------------------------+ 0xeebfe000
                     |      Normal User Stack       | RW/RW  PGSIZE
                     +------------------------------+ 0xeebfd000
                     |                              |
                     |                              |
                     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
                     .                              .
                     .                              .
                     .                              .
                     |~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~|
                     |     Program Data & Heap      |
    UTEXT -------->  +------------------------------+ 0x00800000
    PFTEMP ------->  |       Empty Memory           |        PTSIZE
                     |                              |
    UTEMP -------->  +------------------------------+ 0x00400000
                     |       Empty Memory           |        PTSIZE
    0 ------------>  +------------------------------+

The UVPT

We had a nice conceptual model of the page table as a 2^20-entry array that we could index with a physical page number. The x86 2-level paging scheme broke that, by fragmenting the giant page table into many page tables and one page directory. We'd like to get the giant conceptual page-table back in some way -- processes in JOS are going to look at it to figure out what's going on in their address space. But how?

Luckily, the paging hardware is great for precisely this -- putting together a set of fragmented pages into a contiguous address space. And it turns out we already have a table with pointers to all of our fragmented page tables: it's the page directory!

So, we can use the page directory as a page table to map our conceptual giant 2^22-byte page table (represented by 1024 pages) at some contiguous 2^22-byte range in the virtual address space. And we can ensure user processes can't modify their page tables by marking the PDE entry as read-only.

Puzzle: do we need to create a separate UVPD mapping too?

A more detailed way of understanding this configuration:

Remember how the X86 translates virtual addresses into physical ones:

CR3 points at the page directory. The PDX part of the address indexes into the page directory to give you a page table. The PTX part indexes into the page table to give you a page, and then you add the low bits in.

But the processor has no concept of page directories, page tables, and pages being anything other than plain memory. So there's nothing that says a particular page in memory can't serve as two or three of these at once. The processor just follows pointers: pd = lcr3(); pt = *(pd+4*PDX); page = *(pt+4*PTX);

Diagramatically, it starts at CR3, follows three arrows, and then stops.

If we put a pointer into the page directory that points back to itself at index V, as in

then when we try to translate a virtual address with PDX and PTX equal to V, following three arrows leaves us at the page directory. So that virtual page translates to the page holding the page directory. In Jos, V is 0x3BD, so the virtual address of the UVPD is (0x3BD<<22)|(0x3BD<<12).

Now, if we try to translate a virtual address with PDX = V but an arbitrary PTX != V, then following three arrows from CR3 ends one level up from usual (instead of two as in the last case), which is to say in the page tables. So the set of virtual pages with PDX=V form a 4MB region whose page contents, as far as the processor is concerned, are the page tables themselves. In Jos, V is 0x3BD so the virtual address of the UVPT is (0x3BD<<22).

So because of the "no-op" arrow we've cleverly inserted into the page directory, we've mapped the pages being used as the page directory and page table (which are normally virtually invisible) into the virtual address space.