| 6.097: OPERATING SYSTEM ENGINEERING |
| Fall 2002 |
| Lab 4 |
| Hand out date: Wednesday October 16th |
| Due date: Monday, October 28th (Exercises 1-4) |
| Due date: Thursday, October 31st (Exercise 5) |
| Due date: Thursday, November 7th (Exercise 6) |
The source tarball may not compile correctly when you first untar it and copy over your lab 3 source files. Nevertheless, the compilation proceeds far enough so that you can answer the questions. By the time you finish exercise 1, you should have fixed the problems.
Be certain to note the "0x" in "0x30". This number is in hex.
exercise4_test1()
Perhaps contrary to your intuition, this test case should not generate a fault. Do you see why?
In addition to running exercise4_test1() as is, edit it
and run it as shown below.
void
exercise4_test1 (void)
{
sys_cputs ((char *)1);
}
sys_mem_remap and
sys_mod_perm
The instructions suggested ordering the sys_mod_perm
before sys_mem_remap. In fact, the parent should
remap before changing its own permissions. (Why?)
In this lab, you will fill out the interface that the kernel exports
to the user process. The first major addition will be the system call
interface. Then you will implement a sophisticated page fault
handling mechanism, which will allow a user process to handle its own
page faults. Finally, the user process will use these new interfaces
to implement fork() and IPC.
The lab consists of a series of exercises which build on top of each
other, culminating in the final exercise: implementing
fork().
Note! In order to encourage you to pace yourself
appropriately, we have broken up the lab into three parts.
fork(). This part involves getting many details right
so be sure to budget in debugging time for this exercise.
ossrc/lab4_answers.html (do not link to any external
files).
athena% cd ~/6.097/ossrc athena% gmake handin . . . athena% ls -l handin.tgz
A fool-proof way to accomplish this is to use the following command:
athena% mhmail 6.097-handin@pdos.lcs.mit.edu -subject http://web.mit.edu/PATH/TO/handin.tgz -body empty
lab4_answers.html. This will allow us to see
your progress and give you incremental feedback.
You should now download the code for the lab. Some files are absent from this tarball and must be copied form your lab 3 solutions.
Be careful not to overwrite your lab 3 solutions.
athena% add gnu 6.097 sipb athena% cd ~/6.097 athena% mv ossrc lab3-solutions athena% wget http://pdos.lcs.mit.edu/6.097/labs/lab4.tar.gz athena% gtar -zvxf lab4.tar.gz drwxr-xr-x cates/wheel 0 Sep 15 19:34 2002 ossrc/ drwxr-xr-x cates/wheel 0 Sep 15 19:34 2002 ossrc/kern/ drwxr-xr-x cates/wheel 0 Sep 15 19:34 2002 ossrc/kern/inc/ -rw-r--r-- cates/wheel 2528 Sep 14 17:00 2002 ossrc/kern/inc/asm.h . . . athena% cp lab3-solutions/kern/locore.S ossrc/kern athena% cp lab3-solutions/kern/trap.c ossrc/kern athena% cp lab3-solutions/kern/pmap.c ossrc/kern athena% cp lab3-solutions/kern/env.c ossrc/kern athena% cp lab3-solutions/kern/sched.c ossrc/kern athena% cp lab3-solutions/kern/init.c ossrc/kern
The tar ball contains new code (in ossrc/user/) and new
makefiles for building user environments in a clean manner. They
supplant the crude way of writing/building user environment in lab 3.
In that lab, machine code snippets from lab3.S were
copied into the user's address space at UTEXT. The
problem is that the code snippets were compiled as part of the kernel,
and so they were linked at the kernel's link address. Thus, the code
in lab3.S was constrained to being position independent.
To do this, run these commands and examine the output of "gmake all".
athena% cd ossrc athena% gmake clean athena% gmake allIn your answer, list the steps that the source files in the
ossrc/user/ undergo. How are they incorporated into the
kernel image? Why is the ossrc/user/ code permitted to be position
dependent?
ossrc/user/kenv0 is known as the idle process and it
serves a function integral to the correct operation of the kernel: it
spins in a loop. This turns to be highly constructive because while
it's spinning, interrupts are enabled. The idle process gives the
kernel something to do when all other processes are not ready to be run
(perhaps waiting on I/O) without blocking device interrupts.
ossrc/user/simple is called the init process. The kernel
makes no assumptions on the actions of this process.
Modify the code in i386_init() to create the idle process
and the init process. Make sure the idle process is created as
__envs[0].
trap()
trap() calls dispatch_syscall() when
tf_trapno == 0x30.
dispatch_syscall() dispatches to the correct C
system call function, based on the system call number.
Skeletal code implementing these actions exists in
inc/syscall.h and syscall.c. You must
finish the coding.
The system calls sys_getenvid(), sys_cputu()
and sys_cputs have been implemented for you. Once you
have finished coding the kernel system call path, you are ready to run
your system. You do not have to implement any additional system calls
at this point. The init process will print a banner to the screen.
Do not continue until the init process runs correctly. The test code
in the init process is located in __main() in the file
user/simple/libos.c.
In this exercise you will implement a simple IPC mechanism which will transfer a single 4-byte word of data from one environment to another. A key feature (which your implementation be required to fulfill) is that every IPC send corresponds to exactly one IPC receive: no IPCs are ever lost or received more than once.
This mechanism is implemented with three fields in the
Env structure: env_ipc_value,
env_ipc_from, and env_ipc_blocked; and two
system calls: sys_ipc_send and
sys_ipc_unblock. The comments in the source code
explain these fields and functions.
sys_ipc_unblock() and
sys_ipc_send().
ipc_send()
and ipc_read() (in user/simple/libos.c).
i386_init() to create two
instances of the init process. Then edit the code for the init
process, and change it to run the exercise3() test. The
two init processes should continually ping pong an increasing counter
back and forth over IPC.
Before continuing, revert i386_init() so that it creates
the init process only once.
page_fault_handler() routine.
void
page_fault_handler (struct Trapframe *tf)
{
u_int va = rcr2 ();
#if 0
u_int env_id = curenv ? curenv->env_id : -1;
printf ("%%%% [0x%x] Page fault (code 0x%x) for VA 0x%x (env 0x%x)\n"
" eip = 0x%x, cs = 0x%x, eflags = 0x%x\n",
tf->tf_trapno, 0xffff & tf->tf_err, va, env_id,
tf->tf_eip, 0xffff & tf->tf_cs, tf->tf_eflags);
/* Only traps from user mode push %ss:%esp */
if (tf->tf_err & FEC_U)
printf (" esp = 0x%x, ss = 0x%x\n", tf->tf_esp, 0xffff & tf->tf_ss);
#endif
}
As you know from lab 3, in response to an exception, the processor pushes parameters on to the kernel stack.
In particular, a page fault causes an error code to be pushed. The
FEC_U bit can be tested. If it's set, the fault was
generated while the processor was in user mode (i.e., the bottom 2
bits of the %cs are cleared). Otherwise, it was the
kernel that faulted.
Only faults that transfer control from the user to the kernel cause
the %esp and %ss to be pushed at the time of
the exception. When the kernel faults, the trapframe fields
tf_esp and tf_ss will contain random garbage.
Additionally, the processor puts the faulting address in the
%cr2 register.
In this exercise, you will write the code that handles kernel page faults. The description of what you must implement follows.
The kernel global variable page_fault_mode controls the
response of page_fault_handler() to a kernel page fault.
It takes on two values: PFM_NONE and
PFM_KILL.
PFM_NONE --
Normally, the kernel is not expected to generate a page fault, and if
it does, the kernel must have a bug. This state is represented by
PFM_NONE. If the page_fault_handler fields
a kernel page fault and page_fault_handler is set to this
value, it should panic. Essentially, this is your "blue screen of
death".
PFM_KILL --
In specific places in the kernel, such as within the
sys_cputs function, the page_fault_mode is
set to PFM_KILL while the user pointer is accessed. If a
fault occurs the page_fault_handler() checks the
page_fault_mode, and if it is set to
PFM_KILL it "blames" the user process for the fault,
kills it and continues. Technically the kernel faulted, but it
faulted on behalf the user process, so the user process is killed.
You should now implement this page fault handling strategy.
sys_cputs() continued
There is another type of malicious user behavior that the kernel must
prevent. The user process is not permitted to read any memory
location greater than or equal to VA ULIM. So,
logically, if the user passes an address past ULIM to
sys_cputs(), the kernel should not print the memory at
that address.
You should now modify sys_cputs() again to protect
against this malicious user behavior. Hint: use the trup
macro in mmu.h. That macro is part of the kernel's
strategy to protect itself against pointers passed to it by the user
process.
In this exercise, you will further extend the user process so that it can handles its own page faults in an analogous way to which the kernel handles its page faults. User page fault handling requires a careful contract between the kernel and the user process which is detailed below.
The user process uses the system call
sys_set_pgfault_handler() to specify the address of its
page fault handler routine and the address of its exception stack.
void
sys_set_pgfault_handler (u_int func, u_int xstacktop)
{
curenv->env_pgfault_handler = func;
curenv->env_xstacktop = xstacktop;
}
Aside: you might notice the strong analogy to the hardware exception
mechanism. The kernel tells the hardware the handler routine by
setting up the IDT and sets up the TSS (ts.ts_esp0 =
KSTACKTOP) to point to the exception stack.
locore.S, which
calls trap() and eventually
page_fault_handler().
page_fault_handler() should test the
FEC_U bit of the error code.
page_fault_handler() should
propogate the fault to the user process, by pushing paramters on the
user's exception stack:
<---- curenv->env_xstacktop
spare 1
spare 2
spare 3
spare 4
spare 5
eip
eflags
esp
error code
va <----- user %esp
However, if the user's %esp (at the time of the page
fault) falls within the user's exception stack (with is 4096 bytes in
size), then the user's %esp is taken to be the user's
exception stack pointer. In this case, the exception paramters are
pushed on the exception stack at the point of the current value of
the user's %esp. This permits recursive user page
faults.
Recursive user fault.
(user %esp must be within the user's exception stack!)
<---- user %esp at time of page fault
spare 1
spare 2
spare 3
spare 4
spare 5
eip
eflags
esp
error code
va <----- user %esp
iret's to the user code location
curenv->env_pgfault_handler. (You must figure out how to
arrange for this to happen.) From there the user process can handle
the fault as it sees fit.
Some code exists to do what has been described above. You should
finish it. Among other things, this work will include implementing
the sys_mem_alloc system call. Then you should run the
test cases provided.
The test cases do not generate recursive user page faults (i.e., a page fault from within the user's page fault handler). You might wish to test this functionality yourself. It's worth spending extra effort now to get rid of any lurking bugs. Exercise 6 is difficult enough by itself.
Questions:fork()
In this exercise, you will implement fork(). This
function will build on top of the mechanisms you have implemented in
the previous exercises.
fork() clones the parent process at the time it is
called, creating a child process whose memory contents are identical
to the parent. After the fork, any subsequent memory writes by the
parent modify its memory only--not the child's memory. And similarly,
subsequent writes by the child are not visible to the parent.
These two properties are the essence of fork(), so let's
point them out again:
A simple implementation of fork() is to copy all of
parent's memory into the child. It's easy to see how it satisfies
both properties 1 and 2. However, this implementation is inefficient.
The text segment, for example, is identical for the parent and child,
so it seems wasteful to make a second copy of it for the child.
Your implementation of fork() addresses these objections by employing
virtual memory tricks. Instead, of copying all the parent's memory
into the child, the parent just maps it all into the child. This is
far more efficient than a copying implementation of fork(), because it takes
far fewer CPU cycles to map memory (you just need to update the page table
entries) than to copy it. It's trivial to see that property 1 is
satisfied. However property 2 is not. When the child writes to
memory, say to a global variable, the parent will see that write when
it reads that global variable.
Property 2 will be satisfied by using well-known technique called
copy-on-write. This copy-on-write technique is actually just a slight
tweak to the mapping implementation of fork outlined in the paragraph
above.
The parent will still map all of its memory into the child's address
space. However, any pages that were previously writeable
(PG_W) in the parent's address space, are marked
PG_COW (and not PG_W), for both the parent
and the child. (note: Property 1 is still satisfied at this point.)
Then both the parent and child can be executed.
If either process writes to memory, it will fault (since
bits are off). Then, the user level fault handler,
handles the fault, and checks if the fault was to a PG_WPG_COW page. If
so, the handler allocates a new page, copies the content of read-only page to
it, and maps the new page with PG_W permission into the faulting
memory location. The user fault handler then resumes execution at the
instruction which caused the fault. When, the faulting instruction is
re-executed it will not fault; it will write the memory successfully.
The key is that as on-demand writes to PG_COW pages happen, the user
process clones the page. Since, new memory is allocated for the
cloned pages, writes are only visible to the process doing the write.
Hence, property 2 is satisfied.
To summarize, the strategy for implementing fork:
Here's the control flow for fork():
sys_env_alloc() to allocate a child
environment. (The child's initial status is ENV_NOTRUNNABLE.)
PG_COW to the PTE's
and removes PG_W).
The exception stack is excepted from this. It should be left writeable in the parent's address space. The exception stack is what implements the copy-on-write handling, so it shouldn't be subject to it.
Note: the exception stack is a violation of property 1. The parent and the child have their own. This is very important. Serious problems result from the parent and child sharing the same physical memory for their exception stacks.
ENV_OK.
Here's the control flow for the user's exception handler:
asm_pgfault_handler
asm_pgfault_handler calls pgfault_handler -- the user's C routine
pgfault_handler checks and realizes the fault is a
write (check the FEC_WR bit of the error code) to a page
marked PG_COW, so it calls handle_cow_fault.
handle_cow_fault handles the copy-on-write by:
NBPG from va&~PGMASK to the new page
PG_W bit set at va&~PGMASK
pgfault_handler
asm_pgfault_handler
fork() needs:
sys_mem_remap, sys_mod_perms, and
sys_env_alloc. You'll need to add (some of) these system
calls to kern/inc/syscall.h. Also, you'll notice that
sys_mem_remap takes one more argument than is currently
supported by the syscall mechanism. Change the syscall mechanism to
pass this last argument in a register of your choosing.
fork(),
asm_pgfault_handler(), pgfault_handler(),
and handle_cow_fault() as described above (all are in
simple/libos.c). Refer to the function
dump_va_space() in that file. It demonstrates some
useful techniques.
handle_cow_fault. Consider the following scenario:
fork() is executed
PG_COW bit set.
pp_refcnt == 1). In this case it is
unnecessary to clone the page, the child should just use
sys_mod_perm to change the perms from PG_COW to PG_W.