6.828 Fall 2003 Lab 0

Simulators, Debuggers, and Disassemblers

Handed out Wednesday, September 3, 2003
Due Thursday, September 11, 2003

Hand-In Procedure

You are to turn in this lab in the form of a web page. When you are done email the URL to 6.828-handin@pdos.lcs.mit.edu. Your web page should contain your answers to the questions.

We don't award web page style points. Just make a functional web page and devote your time to the lab exercises. Please put your name on the web page.

Introdution

Note: this lab is intended to be a fun tour around the various tools, with us holding your hand. If you're not having fun (i.e., you get completely stuck), mail 6.828-staff@pdos.lcs.mit.edu for help. Of course, future labs won't be fed to you quite as much.

In this lab you will learn your way around the software you'll be using for the course. As you know, the course is split between studying the source code for the UNIX V6 kernel (which runs on PDP-11s) and applying the knowledge you gain in the writing of a small operating system for the Intel x86. This lab is similarly split.

The bulk of this lab introduces two PDP-11 simulators that can be used to run the UNIX V6 source code. When you are studying the kernel source throughout the course, knowing your way around the simulators and the kernel code will let you answer questions by inspecting the running system, or even modifying it and recompiling it. This should yield greater understanding of, and perhaps even a greater appreciation for, the code.

The second part of this lab introduces an x86 PC simulator that you will use to develop your own operating system, starting with the next lab. Developing and debugging an operating system in a simulator is often much simpler and quicker than using real hardware. At the same time, the PC simulator is faithful enough that your operating system will run on a real PC if you were so inclined.

Acquiring the Software

We encourage you to do the labs using Athena, where we have built all the tools for you already.

Note: our software is only configured on the Linux Athena machines (i.e., not on the Suns). Use the uname command to make sure you are running on Linux.

To access them, run

athena% uname
Linux
athena% add 6.828 gnu
athena%

UNIX V6 on the PDP-11

We will use a PDP-11 hardware simulator to provide the hardware on which UNIX expects to run. The full manual for the simulator is at http://pdos.lcs.mit.edu/6.828/labs/pdp11.txt. You may find it helpful for future reference.

First, you'll need your own copy of the UNIX disk images and a copy of the UNIX sources too, for later.

athena% cd ~
athena% mkdir 6.828
athena% cd 6.828
athena% cp -R /mit/6.828/sw/v6rk .
athena% cp -R /mit/6.828/sw/v6 .
athena% cd v6rk
athena% ls -l v6*
total 6043
-rw-rw-r--  1 rsc  rsc  2048000 Sep  1 21:57 v6doc
-rw-rw-r--  1 rsc  rsc  2060800 Sep  1 22:19 v6root
-rw-rw-r--  1 rsc  rsc  2048000 Sep  1 21:57 v6src
athena%

There are three disks, named v6doc, v6root, and v6src.

To boot UNIX, start the simulator and type the following at the prompts.

athena% pdp11

PDP-11 simulator V2.3d
sim> attach rk0 v6root
sim> attach rk1 v6src
sim> attach rk2 v6doc
sim> boot rk0

The lines typed at the sim> prompts configure the machine being simulated. In particular, they attach three disks to the machine as devices rk0, rk1, and rk2 and then boot from disk rk0. You can abbreviate attach down to a and boot down to b.

Now the boot loader begins. You give it a kernel name and log in:

@unix

login: root
# stty erase backspace
# stty kill control-U
# ls -l
total 70
drwxrwxr-x  2 bin      1104 May 14 00:47 bin
drwxrwxr-x  2 bin      1824 Aug 14 22:17 dev
drwxrwxr-x  2 bin       496 Aug 14 22:22 etc
drwxrwxr-x  2 bin       464 May 13 23:35 lib
drwxrwxr-x  2 bin        32 May 13 20:01 mnt
drwxrwxrwx  2 bin       272 Jul 18 09:19 tmp
-rw-rw-rw-  1 root    28854 Jul 18 09:18 unix
drwxrwxr-x 15 bin       240 Aug 14 22:19 usr
#

The @ prompt is printed by the boot loader, asking for the name of the kernel file in the root directory. If you were trying a new kernel, you might install it as /unix.new rather than overwrite /unix, so that you could still boot the old kernel if it turned out your changes weren't for the best.

The stty commands set up your terminal. (Type real backspace and control-U characters.) You must type the first few lines perfectly! Until you run the first stty command, backspace will appear to work but not actually work -- although your xterm (or whatever) window is processing backspace properly, the UNIX V6 tty driver treats it as just another character until stty has done its magic. (The default settings are # for the erase character and @ for the line kill character.)

You may find it interesting to poke around. It's impressive how many tools are packed into that 2MB disk image. The shell command to change directories is chdir, not cd.

To get back out to the simulator prompt, type control-E. When you do that (actually, every time the simulation stops) the simulator prints information about the machine register state. The register values are printed in octal, and all the interaction with the simulator is in octal as well. (The computer world of 2003 has now settled on hexadecimal, but octal was the way of life in 1977. When we're talking about UNIX we'll assume octal, but when we talk about the x86 we'll assume hexadecimal.)

To reset the machine and boot again, run the commands

sim> reset
sim> boot rk0

Exercise 1. The PDP-11 simulator provides a register SR to allow you to pretend to set the switches that would be on the front of a real PDP-11. Run the command e sr (e stands for examine) to look at the current value of SR. What is it? Run d sr 1 (d stands for deposit) to change the SR value. Boot again. What's different? Poke around in ~/6.828/v6/usr/sys to find where that came from. Why didn't you see it the first time you booted? (Hint: you ran the same kernel both times, so the answer is not that the switches caused a different kernel to be loaded.)

The debugger command e state prints the entire state of the CPU, including the registers you've seen at the breakpoints and also various segmentation and device registers.

Exercise 2. Save a transcript of the following session. The UNIX kernel boot sequence starts at address 40 (octal). Run the command d break 40 to set a breakpoint (there's only one!) at address 40. Boot again. When the kernel stops at address 40, what is the next instruction it will execute? Run the command s (for step) to execute that one instruction. Now what is the PC set to? Examine the next 20 bytes of instructions to run by executing e xxx-yyy where xxx is the current value of the PC, and yyy is xxx plus 20. By default e prints the octal words it is examining. You can examine instructions with the -m flag. Try e -m xxx-yyy. Does each instruction occupy the same number of bytes in memory? Run t 5 to trace through those instructions. (You could also run s 5 times. s 5 would step through the 5 instructions but not print the register states after each one.) Clear the breakpoint by setting it to an innocuous value like the top of memory: d break 177777. Then let the simulation continue: c. Hand in the transcript of your session, along with the answers to the three questions above.

Challenge! After letting the machine run for a second, type Control-E and look around. What's going on in the machine? How did it get there?

(Note: challenge questions are 100% optional. They are exercises we think would be fun (though often a fair amount of work) for students who are super excited about the material. You can answer the challenge questions, and we will correct your answers, but they have no effect whatsoever on your course grade.)

To close the simulator, type Control-E to get back to the prompt and then type q.

Rebuilding the UNIX kernel

Now we'll walk through rebuilding the UNIX kernel, so you can try out your own changes. It's easier if we build the kernel from files on your actual file system rather than in an opaque RK05 disk image. Then you could use your favorite editor to edit the sources. (You could edit the sources in the PDP11 simulator using ed if you so desired, though doing that corrupted the one source file I tried to edit.) On the other hand, the early dialect of C used in 1976 won't work in any of today's C compilers, so we have to use the V6 C compilers.

To resolve this, we'll use a different PDP-11 simulator. This simulator is for running PDP-11 UNIX binaries like /bin/ls. It simulates the instructions in the binary, but responds to the system calls using whatever operating system is running on the Athena machine.

(Assuming you have already run the add and cp commands from the previous section...)

athena% setenv V6ROOT ~/6.828/v6
athena% cd ~/6.828/v6rk
athena% v6 ls -l
total 12075
drwxrwxr-x  1 root        0 Sep  2 02:11 CVS
-rw-rw-r--  1 root      315 Sep  2 02:08 README.828
-rw-rw-r--  1 root      512 Sep  2 01:50 boot.rk
-rw-rw-r--  1 root  2048000 Sep  2 02:05 v6doc
-rw-rw-r--  1 root        0 Sep  2 02:08 v6man
-rw-rw-r--  1 root  2060800 Sep  2 04:19 v6root
-rw-rw-r--  1 root  2048000 Sep  2 02:05 v6src
athena%

The v6 command runs V6 binaries (found in $V6ROOT/bin). It ran the V6 ls but on Athena. If this seems strange, that's okay. For now, think of v6 as a black box that runs old UNIX binaries on new systems. At the end of the semester you'll be able to look back at this and immediately figure out what's going on under the covers.

V6 UNIX didn't have make, but we've written a Makefile anyway, to make the kernel easier to build. To build the kernel, run gmake in ~/6.828/v6/usr/sys/conf.

Now that you know the kernel builds, change the message in the printf call you found earlier. Build the kernel again.

After running gmake, you'll have a compiled UNIX kernel in a file named unix. You need to get this kernel onto the RK05 disk image so that the boot loader can find it. To do this, you'll use v6de, a V6 disk editor. The magic command is:

athena% v6de ~/6.828/v6rk/v6root 'cp :unix /unix.new'
athena%

The first argument to v6de is the name of a disk image. The rest of the arguments are commands to run. (If you don't specify any commands, v6de runs an interactive shell.) The command cp copies files between Athena and the disk image. Files in Athena are denoted by a leading :. The above command copies the file unix from the current Athena directory into unix.new in the root directory of the disk image. (The command 'cp /unix :unix.old' would copy the kernel we used before off the disk image and onto Athena.)

Now you can boot the simulator as you did before (remember to cd to ~/6.828/v6rk first), but type unix.new at the @ prompt.

Exercise 3. Boot your modified kernel. Did it work as expected? (If not, figure out why and repeat.)

Challenge! /usr/sys/run is a script of commands to run to build the kernel. Try running them under v6 sh. It doesn't quite work. Figure out why.
Try running them in the PDP-11 simulator, under real UNIX. It doesn't quite work, in a different way. Figure out why.

Warning: the course staff do not know what is causing these bugs. They might be original bugs or it could (perhaps more likely) be bugs in the simulators. We don't know exactly how challenging this question is.

UNIX C calling conventions

Now we'll explore the correspondence between C code and the machine code it compiles into in V6. You will almost certainly find it useful to read chapter 2 of Lions to brush up on PDP-11 assembly.

Create a file x.c that contains:

f(a, b)
{
	int c;

	c = 0;
	c =+ a*010;
	c =+ b*040;
	return c;
}

main()
{
	int x;

	x = f(1,2);
	return x;
}

and then compile and run it:

athena% v6 cc x.c
athena% v6 ./a.out
athena% echo $?
XXX
athena%

Exercise 4. What is the exit status (the XXX in the transcript)?

Now we'll examine the code generated for these functions. To dump the symbol table and extract the addresses of the functions, use:

athena% v6 db ./a.out
_main=
104
_main,20?
jsr r5,csv
...
jmp cret
...
_f,20?
jsr r5,csv
...
jmp cret
...

The command _main= prints the address of the symbol _main (the C compiler automatically prefixes all C names with _ to avoid conflicts with names used in assembly files). The command _main,20? prints the 20 instructions starting at the address of the symbol _main. Note that the first instructions of a C function are jsr r5,csv and tst -(sp). C functions end with a br and then jmp cret, which eventually returns to the caller. These sequences serve as a good way to figure out where the relevant assembly dump stops.

The debugger tries to be helpful, translating constants into symbolic form in the disassembly. For example, if _main is 104, then 142 is printed as _main+36. Unfortunately, there are some symbols near 0 that the debugger misuses to print values that really should display as numeric constants.

Challenge! The source for the debugger is in /usr/source/s1/db*.s in V6. Fix it.

Print the addresses of the ``small'' symbols a, b, start, and x. (x prints as a large positive 16-bit number. What is it as a small negative 16-bit number?) Disassemble main and f. Replace symbolic constants involving the small symbols with actual constants. Save the edited disassemblies for later.

Now we're going to run the code in the PDP11 machine simulator to discover exactly what the stack layout looks like.

Make a backup copy of ken/main.c and then replace its main with our version, including the f function. Recompile your kernel and install it in the RK05 disk. Use db to print the addresses of the functions f and main in the new kernel. Use pdp11 to set a breakpoint at the address of f and then boot the new kernel until it reaches the breakpoint.

Exercise 5. What are the register values at the breakpoint? What does the stack look like for 10 (decimal 8!) words on either side of the stack pointer at the breakpoint?

(To look at the stack, note the kernel stack pointer KSP and then run e -v xxx-yyy where xxx is KSP minus 20 and yyy is KSP plus 20. Words are two bytes, so 10 words is 20 bytes. The -v flag tells the debugger to interpret the argument as a vritual rather than physical address. We'll learn more about what that means in the next lab.)

Note the value pointed at by the stack pointer. This is the return address for f's caller. In particular, it's the address of the instruction after the jsr in main that got us here. Set the breakpoint to that address and then execute t 1000 to trace until you hit the breakpoint (which will happen in far less than 1000 instructions).

Exercise 6. Deduce the values held by each stack location near the stack pointer. Where is the return PC? Where are the function arguments? What is stored in the addresses below the stack pointer when jsr r5,csv executes? You should turn in a chart like the one on Lions page 10-3, but explain what every stack word is used for!

Exercise 7. Armed with your stack diagram, annotate your disassemblies of main and f, explaining the purpose of each line.

Simulating the x86

Now let's switch gears and learn our way around the x86 tools. Luckily, they behave like the PDP-11 tools more or less, so this shouldn't take as long. You may wish to review your x86 assembly by skimming the PC Assembly Book.

The PC simulator is named Bochs. Copy our Bochs configuration and disk image and then start bochs:

athena% cd ~/6.828
athena% cp -R /mit/6.828/sw/x86boot .
athena% cd x86boot
athena% bochs
========================================================================
                       Bochs x86 Emulator 1.4.1
                             June 23, 2002
========================================================================
00000000000i[     ] reading configuration from .bochsrc
00000000000i[     ] .bochsrc: vga_update_interval seems awfully small!
00000000000i[     ] Warning: no rc file specified.
00000000000i[     ] using log file bochs.log
Next at t=0
(0) f000:fff0: e968e0: jmp +#e068
<bochs:1>

Bochs has read the file .bochsrc, started the simulated machine (an X window will pop up showing the screen), and is ready to execute the first instruction. You can set a breakpoint with the b command. You have to give the base explicitly, so say b 0x7C00 (we've switched to hexadecimal). A full command overview is at http://bochs.sourceforge.net/doc/docbook/user/x2095.html. The commands c and s behave the same as in the PDP-11 simulator. There is no t command. Instead, use trace-on and trace-off to set tracing before using the other commands.

Exercise 8. Set a breakpoint at address 7C00, which is where the disk boot block will be loaded. Continue execution until that break point. Trace through the next five instructions. The next interesting step is transfer of execution to the kernel at address 00100020. Breakpoint there and then trace the next five instructions after that. Hand in a transcript of your tracing.

To examine memory in the simulator, you use the x command, which has the same syntax as gdb's. The command overview (linked above) has full details. For now, it is enough to know that the recipe x/nx addr prints n words of memory at addr. (Note that both xs in the command are lowercase.)

Exercise 9. Reset the machine (exit bochs and start it again). Examine the 8 words of memory at 00100000 at the two breakpoints from the last exercise. Hand in the memory listings. Why are they different? (You do not need to use Bochs to answer the last question. Just think.)

When the kernel runs, it colors the top line of the screen blue. Suppose you noticed this and wanted to find the code that was actually doing this. One way would be to set a ``data watch,'' a break point that fires when a particular memory location is read or written. The color attribute of the top left cell on the screen is stored in physical memory location B8001. To set a data watch:

<bochs:1> watch write 0xB8001
<bochs:2> watch stop
<bochs:3> c

The first line sets the watch. The second line instructs bochs to stop the simulation whenever a watch fires.

Exercise 10. What is the instruction pointer that first writes to location B8001 in the kernel? ("Instruction pointer" is the x86 term for the program counter.)

Rebuilding the x86 kernel

It's a little generous to call the code we're running a ``kernel,'' but so be it.

To rebuild the x86 kernel and disk image,

athena% cd ~/6.828/x86boot
athena% gmake kernel
athena% gmake disk

Exercise 11. Change the message that the kernel prints. Rebuild and reboot in Bochs. Did it work? If not, make it work.

GCC x86 Calling Conventions

We will repeat the exercise we did above to learn about the calling conventions on UNIX. Create a file x.c containing (this is different from the V6 one!):

f(a, b)
{
	int c;

	c = 0;
	c += a*0x10;
	c += b*0x40;
	return c;
}

cmain()
{
	int x;

	x = f(1,2);
	return x;
}

Compile it:

athena% i386-osclass-aout-gcc -c x.c

Use i386-osclass-aout-objdump to disassemble it.

athena% i386-osclass-aout-objdump -S x.o

Now replace cmain in ~/6.828/x86boot/kernel.c with these two functions. Rebuild your kernel. Note the addresses of cmain and f in the rebuilt kernel:

athena% cd ~/6.828/kernel
athena% gmake kernel
...
athena% i386-osclass-aout-nm kernel | egrep 'T _(cmain|f)$'

Now run the new kernel in Bochs (remember to gmake disk!), breaking at f.

Exercise 12. What are the register values at the breakpoint? (Use the info regs command.) What does the stack look like for 20 (decimal 32) bytes on either side of the stack pointer at the breakpoint?

Set a break point at the return PC for f (just like in the PDP-11, it should be the value pointed at by the stack pointer). Turn tracing on and run until that break point. Note that unlike in the PDP-11 simulator, Bochs does not print register information at each trace point, which makes your life a little harder.

Exercise 13. Deduce the values held by each stack location near the stack pointer. Where is the return PC? Where are the function arguments? You should turn in a chart like before.

Exercise 14. Armed with your stack diagram, annotate your disassemblies of cmain and f, explaining the purpose of each line.

Challenge! Look at the rules in the Makefile in ~/6.828/x86boot that build boot, the boot loader. The -e start option instructs ld to use start as the entry point for the program. Start is defined in l.s as a one-instruction function that does jmp _cmain. Why not just use -e cmain instead and not bother linking with l.s at all? Try this. Figure out why it fails. Can you make it work without any assembly (i.e., using only C)?