Lecture 5 homework: Unix v6 on the PDP-11

Handed out: Wednesday, September 21, 2005
Due: Monday, September 26, 2005
Read: chapters 6 and 7-1 through the first column of 7-4 of Lions' commentary, and the code corresponding to the chapters. You may have to refer to earlier chapters.

Hand-In Procedure

You are to turn in this homework during lecture on the specified due date. Please write up your answers to the exercises below and hand them in to a 6.828 staff member by the end of the lecture.

Unix v6 on the PDP-11---Part I

Introdution

In this homework you will learn your way around some of the software you'll be using for the course. As you know, the course is split between studying the source code for the UNIX V6 kernel (which runs on PDP-11s) and applying the knowledge you gain in the writing of a small operating system for the Intel x86.

While the labs focus on working with the x86 simulator bochs, this homework introduces you to a PDP-11 simulator, a tool that will aid you in understanding Thompson and Ritchie's original Unix V6 code. When you are studying the kernel source throughout the course, knowing your way around the simulators and the kernel code will let you answer questions by inspecting the running system, or even modifying it and recompiling it. This should yield greater understanding of, and perhaps even a greater appreciation for, the code.

This homework steps through booting the Unix v6 kernel in the PDP-11 simulator, while you will focus on rebuilding the kernel and understanding it further in the next homework. The reference page has links to documents related to Unix v6 on the PDP-11, including the processor handbook covering the internals of the PDP-11 processor.

Acquiring the Software

We encourage you to do the homeworks using Athena, where we have built all the tools for you already.

Note: our software is only configured on the Linux Athena machines (i.e., not on the Suns). Use the uname command to make sure you are running on Linux.

To access them, run

athena% uname
Linux
athena% add 6.828 gnu
athena% 

UNIX V6 on the PDP-11

We will use a PDP-11 hardware simulator to provide the hardware on which UNIX expects to run. The full manual for the simulator is online. You may find it helpful for future reference.

First, you'll need your own copy of the UNIX disk images and a copy of the UNIX sources too, for later.

athena% cd ~
athena% mkdir 6.828
athena% cd 6.828
athena% add 6.828
athena% cp -R /mit/6.828/sw/v6rk .
athena% cp -R /mit/6.828/sw/v6 .
athena% cd v6rk
athena% ls -l v6*
total 6043
-rw-rw-r--  1 rsc  rsc  2048000 Sep  1 21:57 v6doc
-rw-rw-r--  1 rsc  rsc  2060800 Sep  1 22:19 v6root
-rw-rw-r--  1 rsc  rsc  2048000 Sep  1 21:57 v6src
athena% 
There are three disks, named v6doc, v6root, and v6src.

To boot UNIX, start the simulator and type the following at the prompts.

athena% pdp11

PDP-11 simulator V2.3d
sim> attach rk0 v6root
sim> attach rk1 v6src
sim> attach rk2 v6doc
sim> boot rk0
The lines typed at the sim> prompts configure the machine being simulated. In particular, they attach three disks to the machine as devices rk0, rk1, and rk2 and then boot from disk rk0. You can abbreviate attach down to a and boot down to b.

Now the boot loader begins. You give it a kernel name and log in:

 @unix

login: root
# stty erase backspace
# stty kill control-U
# ls -l
total 70
drwxrwxr-x  2 bin      1104 May 14 00:47 bin
drwxrwxr-x  2 bin      1824 Aug 14 22:17 dev
drwxrwxr-x  2 bin       496 Aug 14 22:22 etc
drwxrwxr-x  2 bin       464 May 13 23:35 lib
drwxrwxr-x  2 bin        32 May 13 20:01 mnt
drwxrwxrwx  2 bin       272 Jul 18 09:19 tmp
-rw-rw-rw-  1 root    28854 Jul 18 09:18 unix
drwxrwxr-x 15 bin       240 Aug 14 22:19 usr
# 

The @ prompt is printed by the boot loader, asking for the name of the kernel file in the root directory. If you were trying a new kernel, you might install it as /unix.new rather than overwrite /unix, so that you could still boot the old kernel if it turned out your changes weren't for the best.

The stty commands set up your terminal. (Type real backspace and control-U characters.) You must type the first few lines perfectly! Until you run the first stty command, backspace will appear to work but not actually work -- although your xterm (or whatever) window is processing backspace properly, the UNIX V6 tty driver treats it as just another character until stty has done its magic. (The default settings are # for the erase character and @ for the line kill character.)

You may find it interesting to poke around. It's impressive how many tools are packed into that 2MB disk image. The shell command to change directories is chdir, not cd.

To get back out to the simulator prompt, type control-E. When you do that (actually, every time the simulation stops) the simulator prints information about the machine register state. The register values are printed in octal, and all the interaction with the simulator is in octal as well. (The computer world of 2003 has now settled on hexadecimal, but octal was the way of life in 1977. When we're talking about UNIX we'll assume octal, but when we talk about the x86 in the labs we'll assume hexadecimal.)

To reset the machine and boot again, run the commands

sim> reset
sim> boot rk0

Exercise 1. The PDP-11 simulator provides a register SR to allow you to pretend to set the switches that would be on the front of a real PDP-11. Run the command 'e sr' (e stands for examine) to look at the current value of SR. What is it? Run 'd sr 1' (d stands for deposit) to change the SR value. Boot again. What's different? Poke around in ~/6.828/v6/usr/sys to find where that came from. Why didn't you see it the first time you booted? (Hint: you ran the same kernel both times, so the answer is not that the switches caused a different kernel to be loaded.)

The debugger command 'e state' prints the entire state of the CPU, including the registers you've seen at the breakpoints and also various segmentation and device registers.

Exercise 2. Save a transcript of the following session. The UNIX kernel boot sequence starts at address 40 (octal). Run the command 'd break 40' to set a breakpoint (there's only one!) at address 40. Boot again. When the kernel stops at address 40, what is the next instruction it will execute? Run the command 's' (for step) to execute that one instruction. Now what is the PC set to?

Examine the next 20 bytes of instructions to run by executing 'e xxx-yyy' where xxx is the current value of the PC, and yyy is xxx plus 20. By default e prints the octal words it is examining. You can examine instructions with the -m flag. Try e -m xxx-yyy. Does each instruction occupy the same number of bytes in memory?

Run 't 5' to trace through those instructions. (You could also run s 5 times. s 5 would step through the 5 instructions but not print the register states after each one.) Clear the breakpoint by setting it to an innocuous value like the top of memory: d break 177777. Then let the simulation continue: c.

Hand in the transcript of your session, along with the answers to the three questions above.

Challenge! After letting the machine run for a second, type Control-E and look around. What's going on in the machine? How did it get there?

(Note: challenge questions are 100% optional. They are exercises we think would be fun (though often a fair amount of work) for students who are super excited about the material. You can answer the challenge questions, and we will correct your answers, but they have no effect whatsoever on your course grade.)

To close the simulator, type Control-E to get back to the prompt and then type q.

Unix v6 on the PDP-11 -- Part II

Rebuilding the UNIX kernel

Now we'll walk through rebuilding the UNIX kernel, so you can try out your own changes. It's easier if we build the kernel from files on your actual file system rather than in an opaque RK05 disk image. Then you could use your favorite editor to edit the sources. (You could edit the sources in the PDP11 simulator using ed if you so desired, though doing that corrupted the one source file I tried to edit.) On the other hand, the early dialect of C used in 1976 won't work in any of today's C compilers, so we have to use the V6 C compilers.

To resolve this, we'll use a different PDP-11 simulator. This simulator is for running PDP-11 UNIX binaries like /bin/ls. It simulates the instructions in the binary, but responds to the system calls using whatever operating system is running on the Athena machine.

(Assuming you have already run the add and cp commands from the previous section...)

athena% setenv V6ROOT ~/6.828/v6
athena% cd ~/6.828/v6rk
athena% v6 ls -l
total 12075
drwxrwxr-x  1 root        0 Sep  2 02:11 CVS
-rw-rw-r--  1 root      315 Sep  2 02:08 README.828
-rw-rw-r--  1 root      512 Sep  2 01:50 boot.rk
-rw-rw-r--  1 root  2048000 Sep  2 02:05 v6doc
-rw-rw-r--  1 root        0 Sep  2 02:08 v6man
-rw-rw-r--  1 root  2060800 Sep  2 04:19 v6root
-rw-rw-r--  1 root  2048000 Sep  2 02:05 v6src
athena% 
The v6 command runs V6 binaries (found in $V6ROOT/bin). It ran the V6 ls but on Athena. If this seems strange, that's okay. For now, think of v6 as a black box that runs old UNIX binaries on new systems. At the end of the semester you'll be able to look back at this and immediately figure out what's going on under the covers.

V6 UNIX didn't have make, but we've written a Makefile anyway, to make the kernel easier to build. To build the kernel, run gmake in ~/6.828/v6/usr/sys/conf.

Correction: you will need to run 'mkdir $V6ROOT/tmp' in your athena account for 'v6 cc' -- and by extension gmake -- to work properly.

Now that you know the kernel builds, change the message in the printf call you found earlier. Build the kernel again.

After running gmake, you'll have a compiled UNIX kernel in a file named unix. You need to get this kernel onto the RK05 disk image so that the boot loader can find it. To do this, you'll use v6de, a V6 disk editor. The magic command is:

athena% v6de ~/6.828/v6rk/v6root 'cp :unix /unix.new'
athena% 
The first argument to v6de is the name of a disk image. The rest of the arguments are commands to run. (If you don't specify any commands, v6de runs an interactive shell.) The command cp copies files between Athena and the disk image. Files in Athena are denoted by a leading :. The above command copies the file unix from the current Athena directory into unix.new in the root directory of the disk image. (The command 'cp /unix :unix.old' would copy the kernel we used before off the disk image and onto Athena.)

Now you can boot the simulator as you did before (remember to cd to ~/6.828/v6rk first), but type unix.new at the @ prompt.

Exercise 1. Boot your modified kernel. Did it work as expected? (If not, figure out why and repeat.)

UNIX C calling conventions

Now we'll explore the correspondence between C code and the machine code it compiles into in V6. You will almost certainly find it useful to read chapter 2 of Lions to brush up on PDP-11 assembly.

Create a file x.c that contains:

f(a, b)
{
	int c;

	c = 0;
	c =+ a*010;
	c =+ b*040;
	return c;
}

main()
{
	int x;

	x = f(1,2);
	return x;
}
and then compile and run it:
athena% v6 cc x.c
athena% v6 ./a.out
athena% echo $?
XXX
athena% 
Exercise 2. What is the exit status (the XXX in the transcript)?

Now we'll examine the code generated for these functions. To dump the symbol table and extract the addresses of the functions, use:

athena% v6 db ./a.out
_main=
104
_main,20?
jsr r5,csv
...
jmp cret
...
_f,20?
jsr r5,csv
...
jmp cret
...

The command _main= prints the address of the symbol _main (the C compiler automatically prefixes all C names with _ to avoid conflicts with names used in assembly files). The command _main,20? prints the 20 instructions starting at the address of the symbol _main. Note that the first instructions of a C function are jsr r5,csv and tst -(sp). C functions end with a br and then jmp cret, which eventually returns to the caller. These sequences serve as a good way to figure out where the relevant assembly dump stops.

The debugger tries to be helpful, translating constants into symbolic form in the disassembly. For example, if _main is 104, then 142 is printed as _main+36. Unfortunately, there are some symbols near 0 that the debugger misuses to print values that really should display as numeric constants.

Challenge! The source for the debugger is in /usr/source/s1/db*.s in V6. Fix it.

Print the addresses of the ``small'' symbols a, b, start, and x. (x prints as a large positive 16-bit number. What is it as a small negative 16-bit number?) Disassemble main and f. Replace symbolic constants involving the small symbols with actual constants. Save the edited disassemblies for later.

Now we're going to run the code in the PDP11 machine simulator to discover exactly what the stack layout looks like.

Make a backup copy of ken/main.c and then replace its main with our version, including the f function. Recompile your kernel and install it in the RK05 disk. Use db to print the addresses of the functions f and main in the new kernel. Use pdp11 to set a breakpoint at the address of f and then boot the new kernel until it reaches the breakpoint.

Exercise 3. What are the register values at the breakpoint? What does the stack look like for 10 (decimal 8!) words on either side of the stack pointer at the breakpoint?

The 'e' instruction in the PDP-11 simulator assumes that you are providing it physical addresses by default instead of virtual addresses. In order to examine the kernel stack, use 'e -v' to specify that you are giving it a virtual address, or convert virtual addresses (shown when you examine the contents of KSP and other registers) to physical addresses for use with 'e'.

To look at the stack, note the kernel stack pointer KSP and then run e -v xxx-yyy where xxx is KSP minus 20 and yyy is KSP plus 20. Words are two bytes, so 10 words is 20 bytes. (remember all this is in octal)

Note the value pointed at by the stack pointer. This is the return address for f's caller. In particular, it's the address of the instruction after the jsr in main that got us here. Set the breakpoint to that address and then execute t 1000 to trace until you hit the breakpoint (which will happen in far less than 1000 instructions).

Exercise 4. Deduce the values held by each stack location near the stack pointer. Where is the return PC? Where are the function arguments? What is stored in the addresses below the stack pointer when jsr r5,csv executes? You should turn in a chart like the one on Lions page 10-3, but explain what every stack word is used for!

Exercise 5. Armed with your stack diagram, annotate your disassemblies of main and f, explaining the purpose of each line.

(If you're interested in how physical addresses map to virtual addresses, read chapter 6 in the PDP-11 processor handbook. In short, to determine the physical address for a given virtual address, drop the top 3 bits of the virtual address and add them to value of the relevant register segment address, left shifted by 6 bits. Use 'e kipar0-kipdr7' to examine the contents of the relevant kernel segment registers. For example: when KSP=142000 and KIPAR6=001175, the next stack operation (e.g. JSR) stores a value at the physical memory address: 121476 = ((1175 << 6) + 2000) - 2). The choice of KIPAR6 comes from the top 3 bits of the virtual address in KSP, which equal 6.)

This completes the homework.