Virtual memory

Required reading: Mach external pager

Overview

Virtual memory is at minimum the translation of virtual addresses to physical addresses. This translation involves page tables, their entries (PTEs), the OS setting up the tables, the VM hardware using them, and the page-fault handler when the hardware cannot find a PTE for a virtual address.

More generally, we can view virtual memory has a layer of indirection between virtual addresses and physical addresses that we can use for different purposes:

The operating systems job of managing goes beyond just setting up translations between virtual and physical addresses. It include also:
  1. Mapping memory-like objects: for example, files, zero-fill pages, and copy-on-write of not-really-there pages.
  2. Managing paging and swapping space.
  3. Controled sharing of objects mapped by many processes, at different places

These functions are hardware independent, but require an interface to translation module. The OS cannot rely on the translation module to do much. The translation modules might be just a TLB. Hardware pages tables unlikely to be able to express all the OS wants.

To cope with the complexity of a virtual-memory system, it is useful to split the design in multiple mostly-independent modules:

  1. Pmap layer: managing the state of the VM hardware. Treat main memory as as a cache. Machine-dependent layer.
  2. VM layer: Mapping addresses to objects. Machine-independent.
  3. Pagers: locating the object (the data for the object is often not physical memory).
  4. Global management of limited number of physical pages (e.g., tracking working sets, page replacement, etc.).
Mach, SunOS, 4.4BSD, NT, and OS/X use this split.

Don't view VM as a thin layer just above the memory system. It's actually an important program/OS interface. Allows OS to control what memory references refer to. Most of the implementation is in the OS, not hardware. OS uses flexible control to improve performance. Applications can do the same.

Mach VM

Mach is a microkernel operating system developed at CMU in the 80s. At the time, lots of action around Mach. One of the lasting impacts is its virtual memory design, which has been adopted by 4.4BSD and OS/X.

The kernel implementation picture is as follows (in 4.4BSD terminology):

  1. An address space consists of a list of vm_map_entries
  2. A vm_map_entry contains vstart, vend, projects, offset, and object pointer. A vm_map_entry is not shared.
  3. Objects (files, zero-fill memory) know about object-relative "physical" addresses. They may have a cache of vm_pages, may be shared by mltiple vm_map_entries, and implement read/write sharing of object.
  4. Shadow objects implements copy on write. When a VM object is duplicated (e.g., at fork) a shadow object is created. A shadow object is initially empty, and points to underlying object. When the contents of a page of shadow object is modified, the page is copied and insert in the list of pages for the shadow object. A series of shadow objects pointing to shadow objects or original objects is a shadow chain.

Pagers (or data managers) provide the data for objects. Example data managers: file system server and network shared memory.

Example: fork

Points:

At a page fault:

/*
 * O/S page fault handling code, from 4.4BSD, from Mach.
 *
 * map is the process' list of vm_map_entries.
 * addr is the virtual address that caused the fault.
 * fault_type is read or write.
 */

vm_fault(map, addr, fault_type) {

  /* find addr in the vm_map_entry list */
  for(m = map; m != NULL; m = m->next){
    if(addr >= m->start && addr < m->end){
      object = m->object;
      offset = m->offset;
      protection = m->protection;
      break;
    }
  }

  /* signal a fault to the user? */
  if(m == NULL){
    raise an unmapped page fault signal; return;
  }
  if((fault_type == WRITE && (protection & WRITE) == 0) ||
     (fault_type == READ && (protection & READ) == 0)){
    raise a protection fault signal; return;
  }

  first_object = object;

  /* walk down the chain of shadow objects */
  while(1) {
    page = find cached physical page at offset in object;
    if(page found)
      break;

    if(object has file/disk storage) {
      /* might be a file or a shadow object with disk backing store */
      page = read page from disk or file at offset;
      if(page found)
        break;
    }

    if(object->next)
      object = object->next;
    else
      break;
  }

  if(page == NULL){
    page = allocate a physical page of memory;
    add page to first_object;
    zero-fill page;
  } else if(object != first_object){
    if(fault_type == WRITE){
      /* copy-on-write */
      page = copy(page);
      add page to first_object;
    } else {
      /* prepare for future copy-on-write */
      change page protection to read-only;
    }
  }

  /* update VM hardware */
  pmap_enter(addr, physical address of page, protection);
}

/*
 * Interface to machine-dependent "pmap" VM layer.
 * Manages h/w PTE arrays and/or TLB entries.
 * Operates on the current process's map.
 * Not intended to keep much state beyond h/w PTE tables or TLB.
 */

pmap_enter(vaddr, paddr, protection)
{
  /*
   * vaddr is a virtual address (in the current process).
   * paddr is the physical address of a real page in real memory.
   *
   * 1. Enter new mapping into current PTE array, or TLB.
   *    Or change existing mapping.
   * 2. On machines with virtually-indexed caches, see if paddr
   *    is mapped anywhere else, and delete all other mappings.
   * 3. Flush stale entries from virtually-indexed caches.
   * 4. Flush stale entry from TLB.
   *
   * This may required allocating physical memory for a new PTE entry.
   *
   */
}

pmap_remove(vaddr) { /* delete a mapping */ }

pmap_protect(vaddr, mode) { /* change a mapping's protection mode */ }

pmap_page_protect(paddr, mode) {
  /* change protection of all mappings that point to paddr */
}

pmap_is_dirty(vaddr) { /* has h/w marked the PTE as dirty? */ }

pmap_clear_dirty(vaddr) { /* clear dirty bit */ }
  1. CPU saves user program state on stack, jumps to ... vm_fault().
  2. See vm_fault() pseudo-code:
    1. Find the vm_map_entry, if any.
    2. Might be an unmapped address, or a protection violation. Kill process -- or notify it with a signal!
    3. Follow the object chain.
    4. Might be a resident page, just not in hardware map. Or a write access to a r/o (maybe copy on write) page. Might be found in a shadow object, or underlying object.
    5. Might be non-resident, read from pager.
    6. If copy on write, and writing, and not in first shadow: Make a copy of the page. Install it in the first shadow.
  3. We end up with a vaddr and a physical page; now install translation in what Mach calls the pmap layer (the machine-dependent part).

Paper discussion