1. Introduction

In this lab, you will write the code to exec() an executable stored in the on-disk file system. In typical exokernel fashion, exec() will be implemented in the user space library operating system. The file system stored on-disk is trivial in the extreme -- notably, it's read-only.

2. Hand-in Procedure

Put your answers to the exercises in ossrc/lab5_answers.html (do not link to any external files).

When you are finished coding and answering questions, make a hand-in tarball.

athena% cd ~/6.097/ossrc
athena% gmake handin
.
.
.
athena% ls -l handin5.tgz

Make this file web accessible then send email containing a URL in the Subject line. Your email will be processed automatically, so be certain to follow this format.
A fool-proof way to accomplish this is to use the following command:
```
athena% mhmail 6.097-handin@pdos.lcs.mit.edu -subject http://web.mit.edu/PATH/TO/handin5.tgz -body empty
```

3. Getting Started

You should now download the code for the lab. Many files are absent from this tarball and must be copied form your lab 4 solutions.

Be careful not to overwrite your lab 4 solutions.

athena% add gnu 6.097 sipb
athena% cd ~/6.097
athena% mv ossrc lab4-solutions
athena% wget http://pdos.lcs.mit.edu/6.097/labs/lab5.tar.gz
athena% gtar -zvxf lab5.tar.gz
drwxr-xr-x cates/wheel       0 Sep 15 19:34 2002 ossrc/
drwxr-xr-x cates/wheel       0 Sep 15 19:34 2002 ossrc/kern/
drwxr-xr-x cates/wheel       0 Sep 15 19:34 2002 ossrc/kern/inc/
-rw-r--r-- cates/wheel    2528 Sep 14 17:00 2002 ossrc/kern/inc/asm.h
.
.
.
athena% cp lab4-solutions/kern/locore.S ossrc/kern
athena% cp lab4-solutions/kern/trap.c ossrc/kern
athena% cp lab4-solutions/kern/pmap.c ossrc/kern
athena% cp lab4-solutions/kern/env.c ossrc/kern
athena% cp lab4-solutions/kern/sched.c ossrc/kern
athena% cp lab4-solutions/kern/init.c ossrc/kern
athena% cp lab4-solutions/kern/syscall.c ossrc/kern
athena% cp lab4-solutions/kern/inc/syscall.h ossrc/kern/inc
athena% cp lab4-solutions/user/simple/libos.c ossrc/user/simple

4. Pre-exercise

Two changes have been made in the tarball which require your attention. First, in ossrc/GNUmakefile.global the optimization level has been reduced from -O6 to -O2 (-O6 was generating buggy code in some circumstances). This impacts you because if you have latent bugs in your code, they might be revealed by changing the optimization level.

Second, the __start code from user/simple/libos.c has been moved into an assembly file entry.S. You should comment out this code as it appears in your libos.c. (For the sake of cleanliness, you might wish to move your asm_pgfault_handler into entry.S as well)

You should run the ping pong and sieve test cases from lab 4. Presumably if your lab 4 code doesn't contain any bugs, the test cases will run fine. Don't proceed until they work.

5. File system preliminaries

The user/GNUmakefile has been modified so that it creates a file system disk image. The image consists of a table of contents followed by all the user a.out executables one after another, for each program in ossrc/user. The table of contents is one block in size (i.e., NBPG), and each executable is padded out to the nearest block in size.

+--------------+
|    TOC       |  NBPG
+--------------+
| executable 1 | k1 * NBPG
+--------------+
| executable 2 | K2 * NBPG
+--------------+
      .
      .
+--------------+
| executable n | Kn * NBPG
+--------------+

The tools/mkimg tool builds the disk image according to this format. You should refer to it for the details (especially, for the format of the table of contents).

Now, look at ossrc/.bochsrc and you will find the line:

diskd: file="./user/bochs.img", cyl=200, heads=16, spt=63

This instructs Bochs to treat this image as the second disk drive.

6. Exercises

Exercise 1: Disk access

In this exercise, you'll add a system call to read blocks from the disk. The disk interface is a lot simpler than v6 UNIX' interface primarily because it only supports synchronous reads. It uses programmed I/O (i.e., inb/outb) to transfer the data from the disk.

Your task is to add a system call to your kernel which allows user processes to read blocks from the disk. While the disk supports reading sectors, the interface should be block based so it fits well with the virtual memory system. The exact interface of your system call is defined below. You should base its implementation around the read_block() routine shown below. This helper function should look familiar because it was used by the bootloader from lab 1.

// Allocates a page of memory and maps it at 'va' with read-only
// permissions.  Then reads block number 'blockno' from disk number 'diskno'
// into that page.
// RETURNS
// 0 -- on sucess
// <0 -- otherwise
int
sys_disk_read (u_int diskno, u_int blockno, u_int va)
{
   // your code goes here
}




void
read_block (u_int diskno, u_int blockno, char *destination)
{
#define SECTOR_SIZE 512
  unsigned int sectors_per_block = NBPG/SECTOR_SIZE;
  unsigned int sectorno = sectors_per_block * blockno;
  unsigned char status;

  assert (diskno == 0 || diskno == 1);

  do {
    status = inb(0x1f7);
  } while (status & 0x80);

  outb (0x1f2, sectors_per_block);       // sector count 
  outb (0x1f3, (sectorno >> 0) & 0xff);
  outb (0x1f4, (sectorno >> 8) & 0xff);
  outb (0x1f5, (sectorno >> 16) & 0xff);
  outb (0x1f6, 0xe0 | (0x1 & diskno) << 4 | ((sectorno >> 24) & 0x0f));
  outb (0x1f7, 0x20);         // CMD 0x20 means read sector
  do {
    status = inb (0x1f7);
  } while (status & 0x80);

  insl (0x1f0, destination, NLPG);
}

Exercise 2: File system access

You should now implement the fs_lookup() in libos.c as describe below:

// Looks up 'name' in the disk image created by mkimg.  
//
//RETURNS:
//   block offset of 'name' in the disk
//   <0 -- on error (i.e., 'name' does not exist) 
int
fs_lookup (char *name)
{
  // your code goes here
}

You might also wish to add these functions to

libos.c.

unsigned int
ntohl (unsigned int x)
{
  unsigned char *s = (unsigned char *)&x;
  return (unsigned int)(s[0] << 24 | s[1] << 16 | s[2] << 8 | s[3]);
}

int
strcmp (const char *s1, const char *s2)
{
  /* this code is from FreeBSD's libkern */
  while (*s1 == *s2++)
    if (*s1++ == 0)
      return (0);
  return (*(const unsigned char *)s1 - *(const unsigned char *)(s2 - 1));
}

size_t
strlen(const char *str)
{
  /* this code is from FreeBSD's libkern */
  const char *s;
  for (s = str; *s; ++s);
  return (s - str);
}

char *
strcpy(char *to, const char *from)
{
  /* this code is from FreeBSD's libkern */
  char *save = to;
  for (; (*to = *from) != 0; ++from, ++to);
  return(save);
}



Exercise 3: Basic Exec

In this exercise you will implement exec().

//Creates process 'name'
//   
//RETURNS
//  <0 -- on error
//  Nothing is returned on success, since
//  new process starts executing from the beginning,
//  and old process no longer exists.
//
int
exec (char *name)
{
  // your code goes here
}


exec() replaces the current environment with a new one.
It proceeds as follow:

 Create a new environment
 allocate a stack at USTACKTOP - NBPG
 load 'name' at UTEXT
 start it running from the beginning (i.e., UTEXT+0x20)
 Parent exits.



You might need to add another parameter to
sys_env_alloc().  The behavior of this system call should
vary if it is called by exec() or by fork().
For fork(), the new environment inherits a number of
values from its parent; such as a trap frame, the exception stack, and
the page fault handler.  exec(), on the other hand, does
not want to inherit these values from its parent.  For example, its
execution begins at UTEXT+0x20 (which presumable is its
__start label), not from where the parent called
sys_env_alloc().  Furthermore, the startup code will
allocate an exception stack and register it with the kernel.



You should test your code as you see fit.  One suggestion, however, is
to make a program that exec()'s itself.  This should
cause exec() to be run over and over infinitely.  You'll
run out of memory eventually, unless you go back and implement
env_free().  Just to be clear: implementing
env_free() is just a suggestion -- it is not obligatory.


Exercise 4: Exec arguments

In this exercise, you'll extend exec() with the ability
to pass arguments to the new environment.

For example,
exec ("simple", "-f", "foo", "-c", "junk", NULL);  // NOTICE: the trailing NULL!


Should invoke simple so that it can access its arguments as:
void
umain (int argc, char *argv[])
{
   int i;
   for (i = 0; i < argc; i++) {
     print ("  argv[", i, "] = ");
     sys_cputs (argv[i]);
     sys_cputs ("\n");
   }
}

Output:
  argv[0] = "simple"
  argv[1] = "-f"
  argv[2] = "foo"
  argv[3] = "-c"
  argv[4] = "junk"




There are two components of this work: what the parent does and what
the child does.


 On the parent side (the side which invokes exec()), add
an ellipsis to the signature of exec() so that it can
take a variable number of arguments.

int
exec (char *name,...)
{
  // your code goes here
}



Then exec() must setup the stack of the new environment
so that the arguments appear.  The parent should format the memory
according to the following diagram.  

 
USTACKTOP: 
         +--------------+
         |   block of   | Block of strings.  In the example
         |    memory    | "simple", "-f", "foo", "-c", and
         | holding NULL | "junk" would be stored here. 
         |  terminated  |
         | argv strings |
         +--------------+
         |  &argv[n]    |  Next, comes the argv array--an array of 
         |     .        |  pointers to the string. Each &argv[*] points 
         |     .        |  into the "block of strings" above.
         |     .        |
         |  &argv[1]    |
         |  &argv[0]    |<-.
         +--------------+   |
         |   argv ptr   |__/  In the body of umain(), access to argc 
%esp ->  |   argc       |     and argv reference these two values.
         +--------------+



If these values are on the stack when umain() is called,
then umain() will be able to access its arguments via the
int argc and  char *argv[] parameters.



As indicated in the diagram above, the parent code must also create
the new environment with its %esp pointing at the
argc value.  You'll probably need to modify
sys_env_alloc() to take the initial %esp as an
additional parameter.



Warning: the diagram shows the memory at USTACKTOP since
this is where it will be mapped in the child's address space.
However, be careful!  When the parent formats the arguments, it will
need to do so at some temporary address, since it can't (well,
shouldn't) map over its own stack.  Similarly, take care when set the
pointers arg ptr, &argv[0] .. &argv[n].  These pointers need to
account for the fact that the data will be remapped into the child at
USTACKTOP.




Now for the child side of the exec(): examine the
entry path of the child process under the ___start label.
You'll see that it is written such that __main() and
umain() can be defined taking int argc, char
*argv[].  You'll also notice, that the entry path also takes
care of the case when a new process is created by the kernel, in which
case no arguments are passed.  



The code on the child side has been done for you, except that you
should add the (int argc, char *argv[]) parameters to the
definitions of __main() and umain().




Your task is to implement the parameter passing strategy described
above.  You may assume that there are few enough (and short enough)
arguments so that only one page of stack is needed in the child
environment.




Technical Detail: Actually only the argc and the
argv ptr must be placed on the new env's stack.  The
argv ptr must point to the &argv[0]
.. &argv[n] array, each of which point to a string.  As a
consequence, the &argv[0] .. &argv[n] array and the
"block of strings" can be located anywhere in the new env's address
space--not necessarily on the stack.  In practice, we find it
convenient to store all of these values on the stack as has been
presented in this exercise.




Questions:

How long approximately did it take you to do this lab?


This completes the lab.


Version: $Revision: 1.8 $. Last modified: $Date: 2002/11/08 06:40:19 $