Process abstraction and management

Required reading: Chapter 7 and remaining of chapter 12

Overview

The next set of lectures we will discuss the implementation of kernel services for process management, interprocess communication, I/O, and file systems. Today's topic is process management and the case study will be UNIX's process management services; all operating systems support something similar to what UNIX supports.

A process is one address space combined with one thread. Process management in UNIX include the functions fork, wait, exec, exit, kill, getpid, brk, nice, sleep, and trace. (All system calls in v6 are in the systent table on sheet 29; pick out the ones related to process management.)

To recall how they are used, remember the structure of the shell:

       while (1) {
	    printf ("$");
	    readcommand (command, args);
	    if ((pid = fork ()) == 0) {  // child?
	       exec (command, args, 0);   // arg 0 is name of program
	    } else if (pid > 0) {   // parent?
	       wait (pid);
	    } else {
	       printf ("Unable to fork\n");
            }
        }

As a side note, how does the shell get started? (Init fork+execs login, one per terminal; login execs the shell listed in the passwd file.)

How can a shell implement background jobs (a process group)?

        $ compute &

The shell just doesn't call wait, but instead reads the next command. The shell periodically polls jobs to find out what their status is by calling wait (passing a flag not to block). (Jobs are not supported by v6.)

A process terminates fully when (1) the process exited (perhaps because of kill); and (2) the parent has called wait. A process that has exited but the parent hasn't called wait, enters the zombie state. If a parent process terminates without waiting for all of its child processes to terminate, the remaining child processes are assigned the parent process 1 ID (the init process ID); init waits for processes to terminate, and thus will clean them up.

Why make process 1 the parent? Mostly for convenience. An alternative is the child cleans up itself, but that is a bit tricky since it must remove its own stack etc. It also results in a bit of code duplication, because it would be a different code path than if the parents cleans up. Another possibility is to make the grandparent the parent, but how does the child know who its grandparent is? What if the grandparent dies, then both the child and the parent need a new parent. It probably can all be made to work, but making pid 1 the parent is simpler and it is in a loop calling wait anyway.

Why separate fork and exec? Also, mostly for convience. They already had exec and a process table. Adding just fork was the least amount of new code.

V6 code examples

Exec. Key challenge: set up address space and setup stack with arguments to program. See Lion's chapter 7 for the layout of a user address space.
- 3026: what is uchar? it is a function that copies a byte from user space, u.u_dirp (see nami.c).
- 3034: inode for arg[0].
- 3052: this code is dependent how arguments are layed out. see icode in main.c for an example; icode is explained at the end of Lion's chapter 6.
- 3058: copy a word from the previous mode; the previous address space must be user address space, since we came to exec through a trap instruction.
- 3085: read first 020 bytes from argument 0 of exec into the area that starts with u_arg[0]; this code won't get you an A in 6.170! (argument 0 of exec 0 is already translated, and ip is the inode for argument 0. further more all arguments to exec have been copied into an internal buffer.) the 020 bytes should be the a.out header.
- 3089: u.u_base is a kernel address.
- 3091: u.u_base is now interpreted as a user address.
- 3095: on PDP-11/40, the executable could be 407 (text, data, and stack back-to-back in main memory) or 410 (text is separated from data+stack in physical memory). u-area is right before text in 407 and right before data in 410.
- 3129: contract current address to just the u area; gets rid of the program that was loaded in this address space.
- 3132: grow address space to have enough space to contain new program.
- 3130: xalloc reads in the text segment, if it is not in memory.
- 3140: skip 020 bytes, the a.out header.
- 3155: set the user stack pointer. when returning from the system call, the kernel will copy this value into the user stack pointer register.
- 3155: ap is a negative value v (see 3154). is a negative value loaded in the user stack pointer register? (Answer: no. it is unsigned integer, an address, thus the value is 2^16 - v, pointing exactly in the stack at the top of the address space.)
- 3161: where are we coping the content of the buffer? (answer: to the user stack.) at a negative address in the previous address space? (Answer: no exactly in the right place in the stack.)
- 3188: set pc where rtt will return to 0. thus, when returning to user space, processor will start executing at address 0 in the user address space.
- Why are there 3 calls to estabur? Answer call 1 just checks whether there is space; call 2 ensures that u_base = 0 points to the beginning of the data segment to make the readi call work correctly; call 3 sets up the address space.
- What is the content of the prototype segment address registers after the third call to estabur? Lets assume ts = 180 blocks (block is 64 bytes), data size is 370 blocks, ns = SSIZE = 20 blocks, and the a.out is a 410 executable.
```
  segment              ISA                  ISD
  0                    0		  w=0,ed=0,len=127
  1                    128                w=0,ed=0,len=51
  2                    16                 w=1,ed=0,len=127  // skip uarea
  3		       144		  w=1,ed=0,len=127
  4		       272		  w=1,ed=0,len=113
  5		       0                  w=0,ed=0,len=0
  6		       0		  w=0,ed=0,len=0
  7		       278 (406-128)	  w=1,ed=1,len=108 (128-SSIZE)
```
  sureg adds the offset for where the text and data segment are stored in physical memory. for text the offset is u.uproc->p_textp->x_caddr. for data the offset is u.u_procp->p_addr (the address where u-area is).
- why is sureg separate from estabur? when we swap in a program again, we have to call only sureg, because the program might be swapped into a different location in physical memory.
Fork. Duplicate address space, which is done by newproc()!
- newproc() returns 0 to parent; and 1 to child.
- fork returns child pid to parent. fork returns parent pid to child; the user space library changes this to a zero (to make fork conform to the specs of fork).
Wait.
- 3280: look for a zombie child, and clean it up.
Sbreak (set break point). grow address space with n bytes by allocating physical memory of old size + n. copy old memory, if any, into the new area.