6.097: OPERATING SYSTEM ENGINEERING

Fall 2002

Lab 3

Hand out date: Wednesday October 2nd

Due date: Thursday October 17th

0. Bugs

exercise 1 -- "env_setup_vm() pp_refcnt hint"
The hint for env_setup_vm() contained a typo: the work "not" before "maintained" is missing. The hint should read:
```
  //    - Note: pp_refcnt is not maintained for physical pages mapped above UTOP.
```
The hint means that you don't have increment or decrement any pp_refcnt fields for any struct Ppage's in env_setup_vm().
exercise 1 -- "load_aout() permissions"
The comments for this function do not specify the permissions you should use to map the binary. You should figure out which permissions you'll need. Remember that the binary image is an a.out image which contains both text and data.
exercise 1 -- "bogus CS:EIP in Bochs' debugger output "
The debugger output has been fixed in the body of exercise 1. See below. The changes are in green.
exercise 3 "break point test case"
The following code was accidentally omitted from lab3.S. You should add it to that file yourself. Any where in the file is fine.
```
.global _brkpt_start
_brkpt_start:
.space 0x20
int $3
.global _brkpt_end
_brkpt_end:
```
exercise 4 -- "Alice and Bob"
The bob and alice test cases for exercise 4 have some weaknesses. If you simply forget to save the registers when context switching, no error will be generated. You'll end up restarting each process with its initial register state each time it is run.

Perhaps, you should add the following printf() in env_run. Then you can ensure your processes are making forward progress.
```
  printf ("Running envid %d eip 0x%x\n", e->env_id, e->env_tf.tf_eip);
```

1. Introduction

In this lab, you will also write the code for environment creation and scheduling. In order to pre-empt running environments, you'll be required to support clock interrupts.

In this lab, the terms environment and process are interchangeable--they have roughly the same meaning. We introduce the term environment to stress the point that environments do not provide the same semantics as Unix processes.

2. Environment Overview

This section will give you an overview of the environment code. You should consider this a specification which you'll implement for exercise 1.

Env management

In lab 2, you allocated memory for the __envs[] array. This NENV sized array holds the state of all the possible environments. This means your OS cannot run more than NENV concurrent environments. If an attempt is made to create more than NENV environments, the system will return an E_NO_FREE_ENV error.

Typically, your OS will be running many fewer environments. The remaining environments should be inserted on to the env_free_list. This supports efficient allocation and deallocation of environments, as they just have to be added to/removed from the free list.

The kernel also maintains curenv, which points to the currently executing environment. During boot up, before the first environment is run, this pointer is set to NULL.

From env.c:

struct Env *__envs = NULL;		/* All environments */
static struct Env_list env_free_list;	/* Free list */
struct Env *curenv = NULL;	        /* the current env */

Env state

The figure below shows the state kept by the kernel for each environment.

env_tf -- contains the register values for the environment (see trap.h). It is important for the kernel to save these, so the environment can be resumed.
env_link -- pointer which allows the env to be placed on the env_free_list. (see queue.h for details)
env_id -- unique identifier
env_parent_id -- env_id of the environment which create this environment.
env_status -- one of ENV_FREE, ENV_OK, ENV_NOTRUNNABLE
env_pgdir -- virtual address of this env's page directory
env_cr3 -- physical address of this env's page directory
remaining fields used in future labs

From env.h:

struct Env {
  struct Trapframe env_tf;
  LIST_ENTRY(Env) env_link;
  u_int env_id;
  u_int env_parent_id;
  u_int env_status;
  Pde  *env_pgdir;
  u_int env_cr3;

  /* (below here: not used in lab 3) */
  u_int env_ipc_value;            /* IPC state */
  u_int env_ipc_from;
  u_int env_ipc_blocked;

  u_int env_pgfault_handler;      /* user page fault handler */
};

As can be seen, an environment (like a processs) couples the concept of a thread and an address space. A thread, because of each env is defined, in part, by its registers (the env_tf field). And an address space, because each env has its own VA to PA mapping, defined by env_pgdir/env_cr3.

To restart a non-executing env requires that the kernel restore both the register set of the env and its VA to PA mapping.

struct Env is analogous to struct user in v6 Unix. The key difference between the two is that is the former contains the environment's (ie. the process') register state.

3. Interrupt/Exception Background

The follow sections highlight a few key concepts and mechanisms. At first read, the connection between them might not be clear. However, the coding exercises should show you how all the pieces fit together.

Terminology

This lab adopts the terminology defined in IA-32 Intel Architecture Software Developer's Manual, Volume 3: System programming guide. However, be aware that terms such as exceptions, traps, interrupts, faults and aborts have no standardized meaning. When you see these terms outside of this lab, the meanings might be slightly different.

You should read chapter 5 of the above reference (sections 5.7, 5.8.2, 5.8.3, 5.9 and 5.12.2 are not particularly relevant).

Interrupt discipline

In your operating system, you will make a key simplification compared to v6 Unix. External (ie. device) interrupts are always disabled when in the kernel and always enabled when in user space. External interrupts are controlled by the FL_IF (see mmu.h) bit of the %eflags register. When this bit is set external interrupts are masked. While the bit can be modified in several ways, because of our simplification, we will handle it solely through the %eflags register.

You will have to ensure that the FL_IF flag is set in user processes when they run so that when an interrupt arrives, it gets passed through to the processor and handled by your interrupt code. Otherwise, it will be masked: this is the case when the processor is reset.

Interrupt Mapping

The processor defined exceptions generate IDT vectors 0-31. For example, a page fault is vector 14. When a page fault occurs, the processor transfers control to the CS:EIP defined in IDT[14]. These values define the address of the kernel's page fault handler routine.

External interrupts (i.e., device interrupts) are refered as IRQs--16 are possible IRQ 0-15. The mapping from IRQs to IDT vectors is not fixed--it must be programmed. The function pic_init () in picirq.c maps the device IRQs 0-15 to the IDT vectors IRQ_OFFSET ... IRQ_OFFSET+15.

In picirq.h IRQ_OFFSET is defined to be decimal 32. Thus, the IDT entries 32-44 correspond to the IRQs 0-15. For example, the clock interrupts at IRQ 0. Thus, IDT[32] contains the address of the clock's interrupt handler routine.

These IRQ_OFFSET is chosen so that the device interrupts don't overlap with the processor exceptions.

Stack switching -- the TSS

When kernel executes an instruction which generates an exception (e.g., the kernel divides by zero, deferences a NULL pointer, etc), the processor pushes exception parameters on the current stack. If there was ever not enough space on the stack, the CPU would reset itself.

When a user process is executing and an exception/interrupt occurs, the processor does not push the parameters on the current stack, but rather it switches to the stack defined by the SS0 and ESP0 fields of the TSS (see idt_init() in trap.c). The OS should always guarantee that these are valid addresses, so that the exception can be handled.

An Example

Let's put these pieces together and trace through an example. Let's say a user process is running in a loop, when a clock interrupt occurs.

The processor will switch to the stack defined by the TSS SS0 (GD_KD) and ESP0 (KSTACKTOP).

The processor will then push the exception parameters on the kernel stack, starting at addresss KSTACKTOP:

                     +--------------------+             
                     |                    | KSTACKTOP
                     | 0x00000   old SS   |     " - 4
                     |      old ESP       |     " - 8
                     |     old EFLAGS     |     " - 12
                     | 0x00000 | old CS   |     " - 16
                     |      old EIP       |     " - 20 <---- %esp 
                     +--------------------+

The processor will read entry number IRQ_OFFSET (because IRQ 0 -- is the clock interrupt) of the IDT. And will set the %cs:%eip to the handler function defined there.
The handler function takes control and can handle the exception/interrupt.

4. Hand-in Procedure

The hand-in procedure has changed.

Put your answers to the exercises in ossrc/lab3_answers.html (do not link to any other files)

When you are finished coding and answering questions, make a hand-in tarball.

athena% cd ~/6.097/ossrc
athena% gmake handin
.
.
.
athena% ls -l handin.tgz

Make this file web accessible then send email containing a URL in the Subject line. Your email will be processed automatically, so be certain to follow this format.
A fool-proof way to accomplish this is use the following command:
```
athena% mhmail 6.097-handin@pdos.lcs.mit.edu -subject http://web.mit.edu/PATH/TO/handin.tgz -body empty
```

5. Getting Started

You should now download the code for the lab. This tarball does not contain the file pmap.c. You must copy pmap.c from your lab 2 solutions into the kern directory.

Becareful: not to overwrite your lab 2 solutions.

athena% add gnu 6.097 sipb
athena% cd ~/6.097
athena% mv ossrc lab2-solutions
athena% wget http://pdos.lcs.mit.edu/6.097/labs/lab3.tar.gz
athena% gtar -zvxf lab3.tar.gz
drwxr-xr-x cates/wheel       0 Sep 15 19:34 2002 ossrc/
drwxr-xr-x cates/wheel       0 Sep 15 19:34 2002 ossrc/kern/
drwxr-xr-x cates/wheel       0 Sep 15 19:34 2002 ossrc/kern/inc/
-rw-r--r-- cates/wheel    2528 Sep 14 17:00 2002 ossrc/kern/inc/asm.h
.
.
.
athena% cp lab2-solutions/kern/pmap.c ossrc/kern/

6. Exercises

Exercise 1: Creating and Running Environments

In this exercise you will write the code necessary to run a user process (i.e., a struct Env). Because we do not yet have a filesystem, we have set up the kernel to load a static set of instructions as the first process much like v6 did with icode. However, there are some missing pieces that will need to be filled before this process can run.

In the file env.c, finish coding the following functions:

env_init()
load_aout()
env_setup_vm()
env_alloc() -- you just need to set the tf_eip field
env_create()
env_run()

Once you are done you should compile your kernel and run it under Bochs. The screen should display:

VGA BIOS - Version 2.40
Copyright (C) ...
.
.
.
Physical memory: ...
.
.
  Setup timer interrupts via 8259A
  unmasked timer interrupt

If you see this type control-c into the Bochs debugger, then single step a couple of time. What you should see is listed below. Note: the t=??????? values will be different for you.

Next at t=0
(0) f000:fff0: e968e0: jmp +#e068
 c
Next at t=9355127
(0) 001b:00800020 (unknown context): ebfe: jmp +#fe
 s
Next at t=9355128
(0) 001b:00800020 (unknown context): ebfe: jmp +#fe
 
Next at t=9355129
(0) 001b:00800020 (unknown context): ebfe: jmp +#fe
 
Next at t=9355130
(0) 001b:00800020 (unknown context): ebfe: jmp +#fe

If you don't see this output, then you've made a coding mistake. Fix it before continuing.

Answer the follow questions:

Below is a call graph of the code up to the point where the user code is invoked. For each step, write 1-3 sentences that capture the purpose or the key mechanism of that step.
- start (locore.S)
- i386_init
  - cninit
  - i386_detect_memory
  - i386_vm_init
  - ppage_init
  - idt_init
  - pic_init
  - clock_init
  - env_init
  - env_create
  - yield
    - env_run
      - env_pop_tf

Exercise 2: Clock Interrupts

In exercise 1, after the environment was started (with env_run()), it just spun in a loop. The kernel never got control back. We need to generate and handle clock interrupts. Interrupts will force control back to the kernel. In a later lab, we will also add system calls, with which a user program can ask the processor to transfer control to the kernel.

The calls to pic_init() and clock_init() (from i386_init() in init.c), sets up the clock and the interrupt controller to generate interrupts. However, the user env is not interrupted because the CPU is ignoring device interrupts. You need to correct this. Modify the code in env_alloc() so that user environments are run with interrupted enabled (ie. not ignored).

You might want to re-read sections 5.8 and 5.8.1 of IA-32 Intel Architecture Software Developer's Manual, Volume 3: System programming guide at this time.

Next, you need to setup the IDT and interrupt handler. The handler has been written for you (see clock_interrupt in locore.S. You should edit idt_init() in trap.c at this time. Use the setgate() function to set the IDT entry corresponding to the clock interrupt.

Once you are done coding, compiler and run your kernel. If you see asterisks print out one after another you can continue.

Answer the follow questions:

How many instruction of user code are executed between each interrupt?
How many instruction of kernel code are executed to handle the interrupt?
You'll need to use the Bochs debugger command vbreak to answer these. This command specifies a %cs value and an %eip value. Bochs will break when the processor is about to execute the instruction at the specified segment address: %cs:%eip. For example, vbreak 0x1b:0x800020 breaks at first instruction of the user process. When the kernel is running the %cs is set to 0x8 (well, actually this is true only after i386_vm_init() executes).

When using objdump to dump the kernel pass --adjust-vma=0xf00ff000. It's not clear why this is required, --adjust-vma=0xf0100020 would seem more logical, but nevertheless it is.

Exercise 3: Generalized interrupt/exception handling

In this exercise, you will generalize the IDT to handle all the interrupts and exceptions that we expect to see. Although 256 interrupts/exceptions are possible. Your code only needs to be setup up to handle vectors 0-31, the processor generated exception, and vectors 32-47 (device interrupts IRQs 0-15, note: IRQ_OFFSET=32). (We may add additional ones later.)

Note: Some of the vectors in the range 0-31 are defined by Intel to be reserved. Since they will never be generated, it doesn't really matter how you handle them. Do whatever you think is cleanest.

The overall flow of control that you should achieve is depicted below:

      IDT                   locore.S               trap.c
   
+----------------+                        
|   &handler1    |---------> handler1:          trap (struct Trapframe *tf)
|                |             // do stuff      {
|                |             call _trap          // handle the exception/interrupt
|                |             // undo stuff    }
+----------------+
|   &handler2    |--------> handler2:
|                |            // do stuff
|                |            call _trap
|                |            // undo stuff
+----------------+
       .
       .
       .
+----------------+
|   &handlerX    |--------> handlerX:
|                |             // do stuff
|                |             call _trap
|                |             // undo stuff
+----------------+

Each exception or interrupt has its own handler in locore.S and the IDT is setup with the address of these handler. Each of the handlers should build a struct Trapframe (see trap.h) on the stack and call into trap() (in trap.c) with a pointer to the Trapframe.

Even the clock interrupt should fall under this scheme. This means you will no longer be using the clock_interrupt routine.

After control is passed to trap(), that function handle the exception/interrupt or dispatch the exception/interrupt to a specific handler function.

Now, open locore.S and trap.c and implement what has been described above. The macros IDTFNC and IDTFNC_NOEC in locore.S should help you, as well as the T_* defines in trap.h.

You can use the code from _clock_interrupt as a starting point. But remember: that the code must be modified to build a struct Trapframe. Hint: you code should perform the following steps:
1. push values on the stack in the order defined by struct Trapframe
2. load GD_KD into %ds and %es
3. pushl %esp -- pass pointer to Trapframe which is built on the stack
4. call _trap
5. pop the values pushed in steps 1-3
6. iret
Consider using the pushal and popal instructions; they fit nicely with the layout of the struct Trapframe.
In the trap() function, dispatch clock interrupts to the clock() function. Run your kernel. You should see the asterisks, as you did in the previous exercise. Do not continue until you see this.
The trap handling is going to let you clean up after errant user envs. lab3.S contains a number of test cases which simulate what an errant user env might execute.

To configure your code for a test case edit the following line from i386_init():
```
    env_create (&spin_start, &spin_end - &spin_start);
```
Replace spin_start and spin_end with the analogous names for the test case. You'll also need to declare the symbols for the test case, of course.

Then, recompile and boot your kernel. In trap(), you should call env_destroy() to cleanup after the errant environment. You should expect the following assert in yield to fire:
```
  assert (__envs[0].env_status == ENV_OK);
```
Make sure your kernel passes all of the test cases.

Questions:

How do you know when you pass a test case? What output did you see and why?
The break point test case will either generate a break point exception of a general protect fault depending on how you initialized the break point entry in the idt (i.e., your call to setgate() from idt_init()). Explain how to generate each of these two cases.
What do you think is the point of these mechanisms? (hint: consider what would happen if the user code executed "int $0x20")
What is the purpose of having an individual handler function for each exception/interrupt? (i.e., if all exceptions/interrupts were delivered to the same handler, what functionality, that exists in your current implementation, could not be provided?)

Exercise 4: Pre-emptive Mulitasking

In this exercise you'll create a second env and use the clock interrupt to switch between the two envs.

Edit i386_init(). Replace the line that says:
```
    env_create (&spin_start, &spin_end - &spin_start);
```
with
```
    env_create (&alice_start, &alice_end - &alice_start);
    env_create (&bob_start, &bob_end - &bob_start);
```
(Again, you will need to declare the appropriate symbols.) Now, __envs[0] will correspond to the "alice" code and __envs[1] to the "bob" code.
Edit trap() so that when a clock interrrupt occurs, yield() is called.
Open sched.c and rewrite the function yield(). You should rewrite it to be a round-robin scheduler, which loops over the envs in __envs[] calling env_run() on an env if its status is ENV_OK. Be sure that you don't create starvation problems.
The last step to making this work requires a small modification to env_run(). If you are switching from one env to another, you need to save the register state of the current environment before running the new env. Add the following to the begining of env_run().
```
if (curenv)
  /* FILL IN ONE LINE OF CODE HERE */
```
Hint: the one line of code uses curenv->env_tf and KSTACKTOP.
As an aid to you, "alice" and "bob" check that their registers are saved/restored correctly.

Ensure that your kernel continually switches between the two envs. In env_run(), you might want to print out the env's envid when it is run.

Questions:

Examine the code to "alice" and "bob" in lab3.S. What is the purpose of "alice" and "bob" code?
If two copies of "alice" were run (instead of running "alice" and "bob"), what mistake in the register saving/restoring code would go undetected?
In your implementation of env_run() you should have called lcr3(). But before and after the call to lcr3(), your code makes references (at least it should) to the variable e--the argument to env_run. How can this work?
Upon loading the %cr3 register, the addressing context used by the MMU is instantly changed. The problem is that a virtual address (namely e) has meaning relative to a given address context--the address context specifies the physical address to which the virtual address map. Why can the pointer e be referenced both before and after the addressing switch?

Do not be terse in your answer. Make it very clear you understand what is happening.

This completes the lab.

Version: $Revision: 1.11 $. Last modified: $Date: 2002/10/02 18:03:16 $