Lecture 6 homework: Unix v6 on the PDP-11 -- Part II

Handed out: Wednesday, September 15, 2004
Due: Monday, September 27, 2004
Read: chapters 6, 7-1 through the first column of 7-4 of Lions' commentary and the code corresponding to chapter 6.

Hand-In Procedure

You are to turn in this homework during lecture on Monday. Please write up your answers to the exercises below and hand them in to a 6.828 staff member by the end of the lecture.

Introdution

In this homework you will continue simulating the Unix v6 kernel on the PDP-11. You may want to skim over homework 5 to refresh yourself on the purpose of these two homeworks. The reference page has links to documents related to Unix v6 on the PDP-11, including the processor handbook covering the internals of the PDP-11 processor.

Rebuilding the UNIX kernel

Now we'll walk through rebuilding the UNIX kernel, so you can try out your own changes. It's easier if we build the kernel from files on your actual file system rather than in an opaque RK05 disk image. Then you could use your favorite editor to edit the sources. (You could edit the sources in the PDP11 simulator using ed if you so desired, though doing that corrupted the one source file I tried to edit.) On the other hand, the early dialect of C used in 1976 won't work in any of today's C compilers, so we have to use the V6 C compilers.

To resolve this, we'll use a different PDP-11 simulator. This simulator is for running PDP-11 UNIX binaries like /bin/ls. It simulates the instructions in the binary, but responds to the system calls using whatever operating system is running on the Athena machine.

(Assuming you have already run the add and cp commands from the previous section...)

athena% setenv V6ROOT ~/6.828/v6
athena% cd ~/6.828/v6rk
athena% v6 ls -l
total 12075
drwxrwxr-x  1 root        0 Sep  2 02:11 CVS
-rw-rw-r--  1 root      315 Sep  2 02:08 README.828
-rw-rw-r--  1 root      512 Sep  2 01:50 boot.rk
-rw-rw-r--  1 root  2048000 Sep  2 02:05 v6doc
-rw-rw-r--  1 root        0 Sep  2 02:08 v6man
-rw-rw-r--  1 root  2060800 Sep  2 04:19 v6root
-rw-rw-r--  1 root  2048000 Sep  2 02:05 v6src
athena%

The v6 command runs V6 binaries (found in $V6ROOT/bin). It ran the V6 ls but on Athena. If this seems strange, that's okay. For now, think of v6 as a black box that runs old UNIX binaries on new systems. At the end of the semester you'll be able to look back at this and immediately figure out what's going on under the covers.

V6 UNIX didn't have make, but we've written a Makefile anyway, to make the kernel easier to build. To build the kernel, run gmake in ~/6.828/v6/usr/sys/conf.

Correction: you will need to run 'mkdir $V6ROOT/tmp' in your athena account for 'v6 cc' -- and by extension gmake -- to work properly.

Now that you know the kernel builds, change the message in the printf call you found earlier. Build the kernel again.

After running gmake, you'll have a compiled UNIX kernel in a file named unix. You need to get this kernel onto the RK05 disk image so that the boot loader can find it. To do this, you'll use v6de, a V6 disk editor. The magic command is:

athena% v6de ~/6.828/v6rk/v6root 'cp :unix /unix.new'
athena%

The first argument to v6de is the name of a disk image. The rest of the arguments are commands to run. (If you don't specify any commands, v6de runs an interactive shell.) The command cp copies files between Athena and the disk image. Files in Athena are denoted by a leading :. The above command copies the file unix from the current Athena directory into unix.new in the root directory of the disk image. (The command 'cp /unix :unix.old' would copy the kernel we used before off the disk image and onto Athena.)

Now you can boot the simulator as you did before (remember to cd to ~/6.828/v6rk first), but type unix.new at the @ prompt.

Exercise 1. Boot your modified kernel. Did it work as expected? (If not, figure out why and repeat.)

Challenge! /usr/sys/run is a script of commands to run to build the kernel. Try running them under v6 sh. It doesn't quite work. Figure out why.
Try running them in the PDP-11 simulator, under real UNIX. It doesn't quite work, in a different way. Figure out why.

Warning: the course staff do not know what is causing these bugs. They might be original bugs or it could (perhaps more likely) be bugs in the simulators. We don't know exactly how challenging this question is.

UNIX C calling conventions

Now we'll explore the correspondence between C code and the machine code it compiles into in V6. You will almost certainly find it useful to read chapter 2 of Lions to brush up on PDP-11 assembly.

Create a file x.c that contains:

f(a, b)
{
	int c;

	c = 0;
	c =+ a*010;
	c =+ b*040;
	return c;
}

main()
{
	int x;

	x = f(1,2);
	return x;
}

and then compile and run it:

athena% v6 cc x.c
athena% v6 ./a.out
athena% echo $?
XXX
athena%

Exercise 2. What is the exit status (the XXX in the transcript)?

Now we'll examine the code generated for these functions. To dump the symbol table and extract the addresses of the functions, use:

athena% v6 db ./a.out
_main=
104
_main,20?
jsr r5,csv
...
jmp cret
...
_f,20?
jsr r5,csv
...
jmp cret
...

The command _main= prints the address of the symbol _main (the C compiler automatically prefixes all C names with _ to avoid conflicts with names used in assembly files). The command _main,20? prints the 20 instructions starting at the address of the symbol _main. Note that the first instructions of a C function are jsr r5,csv and tst -(sp). C functions end with a br and then jmp cret, which eventually returns to the caller. These sequences serve as a good way to figure out where the relevant assembly dump stops.

The debugger tries to be helpful, translating constants into symbolic form in the disassembly. For example, if _main is 104, then 142 is printed as _main+36. Unfortunately, there are some symbols near 0 that the debugger misuses to print values that really should display as numeric constants.

Challenge! The source for the debugger is in /usr/source/s1/db*.s in V6. Fix it.

Print the addresses of the ``small'' symbols a, b, start, and x. (x prints as a large positive 16-bit number. What is it as a small negative 16-bit number?) Disassemble main and f. Replace symbolic constants involving the small symbols with actual constants. Save the edited disassemblies for later.

Now we're going to run the code in the PDP11 machine simulator to discover exactly what the stack layout looks like.

Make a backup copy of ken/main.c and then replace its main with our version, including the f function. Recompile your kernel and install it in the RK05 disk. Use db to print the addresses of the functions f and main in the new kernel. Use pdp11 to set a breakpoint at the address of f and then boot the new kernel until it reaches the breakpoint.

Exercise 3. What are the register values at the breakpoint? What does the stack look like for 10 (decimal 8!) words on either side of the stack pointer at the breakpoint?

The 'e' instruction in the PDP-11 simulator assumes that you are providing it physical addresses by default instead of virtual addresses. In order to examine the kernel stack, use 'e -v' to specify that you are giving it a virtual address, or convert virtual addresses (shown when you examine the contents of KSP and other registers) to physical addresses for use with 'e'.

To look at the stack, note the kernel stack pointer KSP and then run e -v xxx-yyy where xxx is KSP minus 20 and yyy is KSP plus 20. Words are two bytes, so 10 words is 20 bytes. (remember all this is in octal)

Note the value pointed at by the stack pointer. This is the return address for f's caller. In particular, it's the address of the instruction after the jsr in main that got us here. Set the breakpoint to that address and then execute t 1000 to trace until you hit the breakpoint (which will happen in far less than 1000 instructions).

Exercise 4. Deduce the values held by each stack location near the stack pointer. Where is the return PC? Where are the function arguments? What is stored in the addresses below the stack pointer when jsr r5,csv executes? You should turn in a chart like the one on Lions page 10-3, but explain what every stack word is used for!

Exercise 5. Armed with your stack diagram, annotate your disassemblies of main and f, explaining the purpose of each line.

(If you're interested in how physical addresses map to virtual addresses, read chapter 6 in the PDP-11 processor handbook. In short, to determine the physical address for a given virtual address, drop the top 3 bits of the virtual address and add them to value of the relevant register segment address, left shifted by 6 bits. Use 'e kipar0-kipdr7' to examine the contents of the relevant kernel segment registers. For example: when KSP=142000 and KIPAR6=001175, the next stack operation (e.g. JSR) stores a value at the physical memory address: 121476 = ((1175 << 6) + 2000) - 2). The choice of KIPAR6 comes from the top 3 bits of the virtual address in KSP, which equal 6.)

This completes the homework.