6.1810 2022 Lecture 8: Q&A

Plan: answering your questions
  Approach:
    walk through staff solutions
    start with pgtbl lab because it was the hardest
    your questions are at bottom of this file

Pgtbl lab comments
  few lines of code, but difficult-to-debug bugs
  - worst case: qemu/xv6 stops running
  - "best" case: kernel panic
  hard to debug for staff too
  - there are so many possible reasons why

Part 1 of pgtbl lab
  Avoid kernel transitions
  Expose kernel state to user space
    augment user space diagram with USYSCALL
  Which system calls can be sped up?  (see kernel/syscall.h)
    can we move system call to user space that write kernel state?
    can we move system call to user space that read kernel state related to other processes?
  Candidate are:
    getpid()
    uptime()
    fstate()?
      maybe possible, but too much state?
  Demo
    pgtbltest.c
    ulib.c
    kernel/memlayout.h
    kernel/proc.c:
      proc_pagetable()
      freeproc()
  Linux has vDSO (https://lwn.net/Articles/615809/)
    vDSO: virtual dynamic shared object
    cat /proc/<pid>/maps
    read-only, shared memory region
    vdso library mapped into each user program
      library interprets data in the shared region
    example timer measurements
      kernel posts time to shared region
      vDSO code adds TSC to latest time
    calls implemented using vDSO (virtual system calls)
      clock_gettime(), getcpu(), getpid(), getppid(), gettimeofday(), set_tid_address()
      
Part 2 of pgtbl lab
  Explain vm output in terms of fig 3-4
       L0, L1, L2/PTE
       page table 0x0000000087f6b000
       ..0: pte 0x0000000021fd9c01 pa 0x0000000087f67000
       .. ..0: pte 0x0000000021fd9801 pa 0x0000000087f66000
       .. .. ..0: pte 0x0000000021fda01b pa 0x0000000087f68000  (text)
       .. .. ..1: pte 0x0000000021fd9417 pa 0x0000000087f65000  (data)
       .. .. ..2: pte 0x0000000021fd9007 pa 0x0000000087f64000  (guard)
       .. .. ..3: pte 0x0000000021fd8c17 pa 0x0000000087f63000  (stack)
       ..255: pte 0x0000000021fda801 pa 0x0000000087f6a000
       .. ..511: pte 0x0000000021fda401 pa 0x0000000087f69000
       .. .. ..509: pte 0x0000000021fdcc13 pa 0x0000000087f73000  (usyscall)
       .. .. ..510: pte 0x0000000021fdd007 pa 0x0000000087f74000  (trapframe)
       .. .. ..511: pte 0x0000000020001c0b pa 0x0000000080007000  (trampoline)
       init: starting sh
    what is entry 0, 1, 2, and 3?
    what is 509, 510, and 511?
    - 511: trampoline
    - 510: trapframe
    - 509: USYSCALL
    why are the protection bits as they are?
      bottom 2 hex digits in PTE/L1/L0
    - 509: URV
    - 510: WRV (no U bit)
    - 511: XRV (no U bit)
    why are XRW = 0 in L0 and L1 ptes?
      indicate that it is a not a leaf PTE
    are the physical addresses contiguous?
  Demo
    structure follows walk()
    vm.c:vmprint()
    vm.c:print_level
      PTE2PA()
      why can we use the output of PTE2PA as a virtual address?
    
Part 3 of pgtbl lab
  Goal: efficiently detect which pages have been accessed
  Idea: exploit hardware page-table walk:
    it sets PTE_A when page is accessed (read or write)
    it sets PTE_D when page is written ("made dirty")
  Approach: syscall w. bitmask indicating which pages are accessed
  Demo
    pgtbltest.c
    sys_pgaccess
      draw: 1 PGSIZE buffer with 1 bit per page
    pgaccess
      argaddr(2, &ubitmask)
        can it fail?
      copyout
        the physical page for a user page is also in kernel AS
  How does Linux use access bits?
    Used for implementing LRU
      Move least accessed page to swap
    Use PTE_D to detect if copy on disk is stale
    Scanning all pages is expensive, so more clever
      Hard problem; many changes/tweaks over time
      https://www.usenix.org/legacy/publications/library/proceedings/usenix05/tech/general/full_papers/jiang/jiang_html/html.html
      https://lwn.net/Articles/495543/
      https://linux-mm.org/PageReplacementDesign
      https://www.kernel.org/doc/gorman/html/understand/understand013.html
      mm/swap.c
  Linux doesn't expose access bits information to user space!
    How would could you detect page access *without* PTE_A?

Syscall lab
  systems calls look like functions calls
  BUT they are not function calls
  trace
    proc.h
    fork()
  sysinfo
    copyout()

Util lab
  primes
    close fd's to terminate correctly
  xargs
    setting up argv 

=== Questions ===

=== util

I was a little confused during the lab-util lab about pipes -- do we
have to close all the write ends of the pipe in order for the program
to terminate (if it was just continuously doing if (read(...)  +> 0))?

===

2) For the very first lab, I had a hard time closing th pipes
correctly. Is there some convention for when to close pipes

===

I was also wondering if the prime sieve question from lab 1 could be
explained, particularly the concurrent sieve/pipeline part.

=== syscall

When a trap saves all the existing registers, is there any limit on
which of those registers the kernel will use? Or is it possible it
will use all of them?

====

I was wondering why there was no error handling for argparsing in
syscalls. The function signature for argint is `void argint(int n, int
*ip)` and the internals of the function appear to panic if a bad input
was supplied. Is there a more graceful way to handle this?

====

When a system call occurs, what's the difference between
"trapframe->a0 = 0;" and "return 0;" in a sys_XXX function? In my
opinion both put 0 to the a0 register (when we are back in user
mode).

===

In the syscall labs, why did we need to use a perl script to generate
usys.S? Why don't we just write all the user syscall stubs in C code
instead of generating all of the code using assembly? Is it just more
compact and efficient to use a script to create assembly code?

=== pgtabl

Also, I got a little confused about page tables.  Are PTEs and virtual
addresses equivalent or related? Is there one PTE per 4096-byte page
(if so is there multiple VAs per page or just one)?

===

In the pagetable lab, the following was mentioned: "Some garbage
collectors (a form of automatic memory management) can benefit from
information about which pages have been accessed (read or write)."

Assuming that the garbage collector for some program is running in
userland, is there a syscall exposed in Linux that achieves this?

===

I didn't understand the last question on lab-pgtbl: What does the
third to last page contain?

From inspecting the PTE, it looked like the PTE_R and PTE_U bits were
set (and not PTE_W) which didn't seem to correspond to any part of the
process's user address space from Figure 3.4?

===

I would like to discuss how copyout works, I mostly understand the
function/code, but it would be good to go over, when exactly copyout
should happen and what actually happens on a high level.

===

Where are page tables themselves stored in memory?

===

How would we implement a LRU paging out system? How do we keep track
of which pages have been most recently used?

===

In the last part of the page table lab, I only mark a page as
"accessed" if the user requesting access information has read
permissions for the given +page.  However, I can't think of anything
harmful that a user could do with access information for a page they
don't have permission to read. Was this permission check totally
unnecessary or am I overlooking something? Would the access bit still
be set by the hardware even if the physical page is accessed via a
different page table?

===

1) for the lab pgtbl, the second question's last part was the
following: What does the third to last page contain?
My answer was this:
    If I understand this correctly, the third to last page is the 509.
The pte ends in 13 which is 00010011 which means the pages is
accessible to user mode, is readable and valid. I suspect that it is
either trampoline or trapframe because vmprint is printing all the
pages and trampoline and trapframe should have been printed. I think
it's trapframe since it's below trampoline.

I am a little confused about the bits though. I have a feeling I'm
encoding this wrong because 00010011 doesn't match with neither the
RX--- nor the R-W-- that's written next to the trampoline and
trampfrane pages in figure 3.4.

Could you please explain the answer to this question?

===

Why do we design page tables with multiple levels?

===

I was a bit unsure about the discussion questions for lab 3 (pgtbl).
In particular, I was wondering about what other system calls could be
optimized via shared memory in user space, and also about the output
of vmprint.

===

How to use gdb to debug page-table? Is it possible to print out the
content of physical address?

===

How does the hardware set the bit for access for pagetable?

===

For lab pgtbl, we printed the initial page table, I am wondering
why there is little relationship between the pa.  For example, 509
0x0000000087f76000 for USYSCALL is close to 510 0x0000000087f77000 for
TRAPFRAME. While 511 0x0000000080007000 for TRAMPOLINE is far from
them.  ow are they allocated?

===

In the page table lab there was an optional challenge exercise for
using super-pages to reduce the number of PTEs. How do we determine
when the page walk should end early (to obtain a super-page)? I assume
that this is chosen by the OS upon a call to sbrk() as a function of
the parameter. How do we then combine ideas for lazy memory allocation
and super-pages? (lazy allocation allocates on a page by page basis,
but how do we know at which granularity to allocate?)

===

In the page table lab, it is written that pgaccess is used by garbage
collectors. I'd like to know in what way this information is used.
Also, doesn't using pgaccess unset the access bits of all pages, thus
causing it to only be usable once before it wipes the slate?

===

I got the logic for PTE_A but I'm not sure about the process, is this
a CPU thing that the page gets marked when it's accessed or it's
somewhere in the kernel?

=== Other

With regards to multiple of the labs, how come we need to hold a lock
when accessing some fields of the proc struct, but not others? What
does it mean for the ones we don't need a lock for to be "private to
the process" (i.e. what exactly is the difference between these two
sets of fields)?

====

Why does xv6 yield the CPU on every single timer interrupt? Isn't this
extremely performance-inefficient? How do real operating systems
handle the yielding of processes?

===

I'm a bit interested in how gdb actually works.

===

This is a more general question, but can we go over visually (maybe
thru a diagram or something) all the important registers for page
table/ trapframe sorts of things, along with important memory
addresses, and how those are different between physical and virtual
memory?  I know there's a diagram 3.3 that shows exactly that, but
it's still confusing to me. We've mentioned a lot of register names/
memory locs and they're all starting to blur together.