6.828 Fall 2003 Lab 0

Simulators, Debuggers, and Disassemblers

Exercise 1. The PDP-11 simulator provides a register SR to allow you to pretend to set the switches that would be on the front of a real PDP-11. Run the command e sr (e stands for examine) to look at the current value of SR. What is it? Run d sr 1 (d stands for deposit) to change the SR value. Boot again. What's different? Poke around in ~/6.828/v6/usr/sys to find where that came from. Why didn't you see it the first time you booted? (Hint: you ran the same kernel both times, so the answer is not that the switches caused a different kernel to be loaded.)

SR starts at 0. When we boot with a non-zero SR, the message "6.828 kernel appears. It is printed using printf from main, which is in /usr/sys/ken/main.c. We didn't see it the first time because printf calls putchar, which consults the switches at SW->integ and does not print if they are zero.

Exercise 2. Save a transcript of the following session. The UNIX kernel boot sequence starts at address 40 (octal). Run the command d break 40 to set a breakpoint (there's only one!) at address 40. Boot again. When the kernel stops at address 40, what is the next instruction it will execute? Run the command s (for step) to execute that one instruction. Now what is the PC set to? Examine the next 20 bytes of instructions to run by executing e xxx-yyy where xxx is the current value of the PC, and yyy is xxx plus 20. By default e prints the octal words it is examining. You can examine instructions with the -m flag. Try e -m xxx-yyy. Does each instruction occupy the same number of bytes in memory? Run t 5 to trace through those instructions. (You could also run s 5 times. s 5 would step through the 5 instructions but not print the register states after each one.) Clear the breakpoint by setting it to an innocuous value like the top of memory: d break 177777. Then let the simulation continue: c. Hand in the transcript of your session, along with the answers to the three questions above.

The simulator transcript looks like:

sim> d break 40
sim> b rk0
@unix

Breakpoint, PC=000040  (JMP 3316)
R0=137000 R1=177404 R2=071000 R3=136064 R4=135030 R5=135770
KSP=136776 USP=177756 PSW=000344 (CM=0,PM=0,IPL=7,tnZvc)

We can see that the next instruction is "JMP 3316".

sim> s

Step expired, PC: PC=003316  (BIT #1,177572)
R0=137000 R1=177404 R2=071000 R3=136064 R4=135030 R5=135770
KSP=136776 USP=177756 PSW=000344 (CM=0,PM=0,IPL=7,tnZvc)

The PC is now 3316.


sim> e 3316-3336
3316:	032767
3320:	000001
3322:	174246
3324:	001374
3326:	000005
3330:	012700
3332:	172340
3334:	012701
3336:	172300
sim> e -m 3316-3336
3316:	BIT #1,177572
3324:	BNE 3316
3326:	RESET
3330:	MOV #172340,R0
3334:	MOV #172300,R1

Since there are 16 bytes of memory but 5 instructions, each instruction cannot be the same size.

sim> t 5
PC=003316  (BIT #1,177572)
R0=137000 R1=177404 R2=071000 R3=136064 R4=135030 R5=135770
KSP=136776 USP=177756 PSW=000344 (CM=0,PM=0,IPL=7,tnZvc)

PC=003324  (BNE 3316)
R0=137000 R1=177404 R2=071000 R3=136064 R4=135030 R5=135770
KSP=136776 USP=177756 PSW=000344 (CM=0,PM=0,IPL=7,tnZvc)

PC=003326  (RESET)
R0=137000 R1=177404 R2=071000 R3=136064 R4=135030 R5=135770
KSP=136776 USP=177756 PSW=000344 (CM=0,PM=0,IPL=7,tnZvc)

PC=003330  (MOV #172340,R0)
R0=137000 R1=177404 R2=071000 R3=136064 R4=135030 R5=135770
KSP=136776 USP=177756 PSW=000344 (CM=0,PM=0,IPL=7,tnZvc)

PC=003334  (MOV #172300,R1)
R0=172340 R1=177404 R2=071000 R3=136064 R4=135030 R5=135770
KSP=136776 USP=177756 PSW=000344 (CM=0,PM=0,IPL=7,tNzvc)


Step expired, PC: PC=003340  (MOV #200,R4)
R0=172340 R1=172300 R2=071000 R3=136064 R4=135030 R5=135770
KSP=136776 USP=177756 PSW=000350 (CM=0,PM=0,IPL=7,tNzvc)

sim> d break 177777
sim> c

login:

Rebuilding the UNIX kernel

Exercise 3. Boot your modified kernel. Did it work as expected? (If not, figure out why and repeat.)

Yes. Yes it did.

UNIX C calling conventions

Exercise 4. What is the exit status (the XXX in the transcript)?

The exit status is 72 decimal.

Exercise 5. What are the register values at the breakpoint? What does the stack look like for 10 (decimal 8!) words on either side of the stack pointer at the breakpoint?

Breakpoint, PC=003604  (JSR R5,3552)
R0=003662 R1=172320 R2=001177 R3=000000 R4=000200 R5=141774
KSP=141756 USP=000000 PSW=030000 (CM=0,PM=3,IPL=0,tnzvc)

sim> e -v 141736-141776
141736:	000000
141740:	000000
141742:	000000
141744:	000000
141746:	000000
141750:	000000
141752:	000000
141754:	000000
141756:	003700
141760:	000001
141762:	000002
141764:	003566
141766:	001177
141770:	000000
141772:	000200
141774:	135770
141776:	003470
sim>

Exercise 6. Deduce the values held by each stack location near the stack pointer. Where is the return PC? Where are the function arguments? What is stored in the addresses below the stack pointer when jsr r5,csv executes? You should turn in a chart like the one on Lions page 10-3, but explain what every stack word is used for!

sim> e -v 141736-141776
141736:	000000
141740:	000000
141742:	000000
141744:	000000
141746:	000000
141750:	000000
141752:	000000
141754:	000000
141756:	003700 -- return PC in main's call to f.
141760:	000001 -- 1st function arg
141762:	000002 -- 2nd function arg
141764:	003566 -- address of cret
141766:	001177 -- saved r4
141770:	000000 -- saved r3
141772:	000200 -- saved r2
141774:	135770 -- saved r5 (left over value from boot loader)
141776:	003470 -- return PC in assembly's call to main.
sim>

Exercise 7. Armed with your stack diagram, annotate your disassemblies of main and f, explaining the purpose of each line.

athena$ v6 db a.out
_main=
104
_main,20?
  jsr r5,csv        save r5, r2, r3, r4 values
  tst -(sp)         standard prologue: alloc a temporary
  mov $2,(sp)       push 2 into that temporary
  mov $1,-(sp)      push 1
  jsr pc,*$_f       call f(1,2)
  tst (sp)+         pop 1 (2 needn't pop because it's in the temp)
  mov r0,-8(r5)     save return value in temp (uses cret addr!)
  mov -8(r5),r0     use temp as our return value
  br _main+36       no-op function epilogue -- goes to next line
  jmp cret          return
...
_f,20?
  jsr r5,csv        same
  tst -(sp)
  clr -8(r5)        c = 0
  mov 4(r5),r0      r0 = a (first arg)
  ash $3,r0         r0 <<= 3 (r0 *= 010)
  add r0,-8(r5)     c += r0
  mov 6(r5),r0      r0 = b (second arg)
  ash $5,r0         r0 <<= 5 (r0 *= 040)
  add r0,-8(r5)     c += r0
  mov -8(r5),r0     use c as our return value
  br _f+50          same
  jmp cret
...

It's worth noting that the code here is different from the code compiled into the kernel, because the kernel is compiled with optimizations.

Digression about V6 calling conventions

You might wonder why the address of cret ends up on the stack. The csv function puts it there when returning after saving the registers. The instruction is jsr pc,(r0).

(In what follows, you can find the files mentioned in /mit/6.828/sw on athena.)

The csv in the C library (v6/usr/source/s4/csv.s) does this explicitly:

/ C register save and restore -- version 12/74

.globl	csv
.globl	cret

csv:
	mov	r5,r0
	mov	sp,r5
	mov	r4,-(sp)
	mov	r3,-(sp)
	mov	r2,-(sp)
	tst	-(sp)
	jmp	(r0)

cret:
	mov	r5,r1
	mov	-(r1),r4
	mov	-(r1),r3
	mov	-(r1),r2
	mov	r5,sp
	mov	(sp)+,r5
	rts	pc

but the one we have in the kernel (in v6/usr/sys/conf/m40.s) has the shorter jsr pc, (r0). This is just a way to squeeze an instruction out of csv. The actual value pushed is irrelevant; the code just wants to make some space. Remember that this function is executed as part of every C function call, so saving one instruction might well be a significant speedup!

The V7 C library's v7/src/libc/crt/csv.s adopted the kernel approach, along with an explanatory comment:

/ C register save and restore -- version 7/75

.globl	csv
.globl	cret

csv:
	mov	r5,r0
	mov	sp,r5
	mov	r4,-(sp)
	mov	r3,-(sp)
	mov	r2,-(sp)
	jsr	pc,(r0)		/ jsr part is sub $2,sp

cret:
	mov	r5,r2
	mov	-(r2),r4
	mov	-(r2),r3
	mov	-(r2),r2
	mov	r5,sp
	mov	(sp)+,r5
	rts	pc

Another interesting question is why the tst -(sp) after jsr r5, csv in the function prologue. This was just a convenient way to subtract two from the stack pointer. In fact, it's shorter, since a sub $2, sp would use an extra word of instruction for the immediate $2.

The V6 C compiler special cased this (v6/usr/source/c/c11.c):

	case SAVE:
		printf("jsr	r5,csv\n");
		t = getw(ascbuf)-6;
		if (t==2)
			printf("tst	-(sp)\n");
		else if (t > 2)
			printf("sub	$%o,sp\n", t);
		break;

Of course, this doesn't answer the question of why our f function allocates space that it never uses, nor does it answer the question of what -6 is in the compiler fragment above.

To answer that, we need to dig deeper into how the compiler works. The argument to the pseudo-op SAVE is the value autolen computed in blkhed in v6/usr/source/c/c02.c. That's the size of the stack frame, effectively. Blkhed initializes autolen to 6 and then processes the code inside the block, which increases autolen as necessary to allocate automatic (stack) storage for local variables. Why does autolen start at 6? Because -autolen is used as the offset from r5 used to allocate a variable. The code to allocate a new local variable does:

	if (dsym->hclass==AUTO) {
		autolen =+ rlength(dsym);
		dsym->hoffset = -autolen;
	}

So the first variable will be stored at -8(r5) as we saw above, with f's c variable. What are the 4 values before that? Consulting our stack diagram we see that they are the saved r5, r2, r3, and r4.

But wait! What about the extra stack word being allocated in csv? That should mean we'd only need to allocate autolen-6-2 words after csv runs. The answer is made clear by the disassembly of main above:

  mov $2,(sp)       push 2 into that temporary
  mov $1,-(sp)      push 1
  jsr pc,*$_f       call f(1,2)
  tst (sp)+         pop 1 (2 needn't pop because it's in the temp)

Notice that the first push didn't have to change the stack pointer! This is because the word was already allocated. More significantly, tst (sp)+ only had to pop one value off the stack. In the common case where there is only one function argument, we can get rid of the pop instruction entirely! Of course, this is only a theory, but v6/usr/source/c/c10.c supports our theory:

	/*
	 * Handle a subroutine call. It has to be done
	 * here because if cexpr got called twice, the
	 * arguments might be compiled twice.
	 * There is also some fiddling so the
	 * first argument, in favorable circumstances,
	 * goes to (sp) instead of -(sp), reducing
	 * the amount of stack-popping.
	 */
	case CALL:

It's also interesting to note that popstk, which generates the code to pop the stack, special-cased 2 words as well as 1, to save space in the instruction encoding (v6/usr/source/c/c11.c):

popstk(a)
{
	switch(a) {

	case 0:
		return;

	case 2:
		printf("tst	(sp)+\n");
		return;

	case 4:
		printf("cmp	(sp)+,(sp)+\n");
		return;
	}
	printf("add	$%o,sp\n", a);
}

Functions with one, two, and three arguments were all presumably common enough to warrant this treatment. In fact, we can check the kernel sources to find out. Here's the breakdown of statements following a call instruction (jsr pc,...) in the kernel code:

 245 tst	(sp)+          pop two args
  84 jmp	cret
  47 cmp	(sp)+,(sp)+    pop three args
  44 add	$6,sp
  38 mov	r0,r4
  35 tst	r0
  13 jsr	pc,_spl0
  12 mov	r0,r3
 ...

We could bill all the cases that aren't labeled as "pop one arg", since in that case there's no instruction at all. It turns out there are 873 function calls and 581 of them had no stack pop code because they had zero or one arguments.

Note that case SAVE above didn't do the same special-casing to allocate a stack frame of two arguments. We might expect that one-word stack frames are quite common (one temporary used to compute a return value) while if you've got more than one word you're likely to have a few, as variables. But then, many variables were kept in registers only (remember that all registers were callee-save), so maybe not. Again, we can check.

If we look at stack frame sizes by considering instructions after jsr r5,csv we find that out of 239 functions, 206 need no prologue whatsoever (they have empty stack frames), 16 use tst -(sp) (they have one-word frames), 10 use sub $4, sp (they have two-word frames), 3 have three-word frames, 3 have five-word frames, and 1 has an eight-word frame. Now you can see why leaving about 400 words for the kernel stack was plenty. So in this case, maybe it would have been reasonable to add the extra case. (It also seems it would have been reasonable to drop the tst -(sp) special case.)

As a final interesting footnote, here's the equivalent v5 stack generation code, first in the compiler (v5/usr/c/c02.c):

	case LBRACE:
		if (d) {
			o2 = blkhed() - 4;
			if (proflg)
				o = "jsr\tr5,mrsave;0f;%o\n.bss\n0:.=.+2\n.text\n";
			else
				o = "jsr	r5,rsave; %o\n";
			printf(o, o2);
		}

and then the register saving routine (v5/usr/source/s4/rsave.s):

/ C register save and restore

.globl	rsave
.globl	mrsave
.globl	rretrn

mrsave:
	tst	(r5)+

rsave:
	mov	r5,r0
	mov	sp,r5
	mov	r4,-(sp)
	mov	r3,-(sp)
	mov	r2,-(sp)
	sub	(r0)+,sp
	jmp	(r0)

rretrn:
	sub	$6,r5
	mov	r5,sp
	mov	(sp)+,r2
	mov	(sp)+,r3
	mov	(sp)+,r4
	mov	(sp)+,r5
	rts	pc

Can you figure out how it works?

Simulating the x86

Exercise 8. Set a breakpoint at address 7C00, which is where the disk boot block will be loaded. Continue execution until that break point. Trace through the next five instructions. The next interesting step is transfer of execution to the kernel at address 00100020. Breakpoint there and then trace the next five instructions after that. Hand in a transcript of your tracing.

athena$ bochs-nogui
========================================================================
                       Bochs x86 Emulator 1.4.1
                             June 23, 2002
========================================================================
00000000000i[     ] reading configuration from .bochsrc
00000000000i[     ] .bochsrc: vga_update_interval seems awfully small!
00000000000i[     ] Warning: no rc file specified.
00000000000i[     ] using log file bochs.log
Next at t=0
(0) f000:fff0: e968e0: jmp +#e068
<bochs:1> b 0x7c00
<bochs:2> c
(0) Breakpoint 1, 0x7c00 in ?? ()
Next at t=205592
(0) 0000:7c00: fa: cli
<bochs:3> s
Next at t=205593
(0) 0000:7c01: fc: cld
<bochs:4> s
Next at t=205594
(0) 0000:7c02: 31c0: xor AX, AX
<bochs:5> s
Next at t=205595
(0) 0000:7c04: 8ec0: mov ES, AX
<bochs:6> s
Next at t=205596
(0) 0000:7c06: 8ed8: mov DS, AX
<bochs:7> s
Next at t=205597
(0) 0000:7c08: 8ed0: mov SS, AX
<bochs:8> b 0x100020
<bochs:9> c
(0) Breakpoint 2, 0x100020 in ?? ()
Next at t=208685
(0) 0008:00100020 (unknown context): e923000000: jmp +#00000023
<bochs:10> s
Next at t=208686
(0) 0008:00100048 (unknown context): 55: push EBP
<bochs:11> s
Next at t=208687
(0) 0008:00100049 (unknown context): 89e5: mov |MOD3|REG4|RM5| EBP, ESP
<bochs:12> s
Next at t=208688
(0) 0008:0010004b (unknown context): 83ec08: sub |MOD3|REG5|RM4| ESP, #08
<bochs:13> s
Next at t=208689
(0) 0008:0010004e (unknown context): e8f1000000: call 000000f1
<bochs:14> s
Next at t=208690
(0) 0008:00100144 (unknown context): 55: push EBP
<bochs:15>

Exercise 9. Reset the machine (exit bochs and start it again). Examine the 8 words of memory at 00100000 at the two breakpoints from the last exercise. Hand in the memory listings. Why are they different? (You do not need to use Bochs to answer the last question. Just think.)

<bochs:1> b 0x7c00
<bochs:2> c
(0) Breakpoint 1, 0x7c00 in ?? ()
Next at t=205592
(0) 0000:7c00: fa: cli
<bochs:3> x/8x 0x00100000
[bochs]:
0x100000 <bogus+0>:	0x00000000	0x00000000	0x00000000	0x00000000
0x100010 <bogus+16>:	0x00000000	0x00000000	0x00000000	0x00000000
<bochs:4> b 0x100020
<bochs:5> c
(0) Breakpoint 2, 0x100020 in ?? ()
Next at t=208685
(0) 0008:00100020 (unknown context): e923000000: jmp +#00000023
<bochs:6> x/8x 0x00100000
[bochs]:
0x100000 <bogus+0>:	0x0064010b	0x00000fe0	0x00001000	0x00000000
0x100010 <bogus+16>:	0x00000a80	0x00100020	0x00000000	0x00000000
<bochs:7>

They are different because the boot loader has run, loading the kernel into that memory.

Exercise 10. What is the instruction pointer that first writes to location B8001 in the kernel? ("Instruction pointer" is the x86 term for the program counter.)

0x001002fd

Rebuilding the x86 kernel

Exercise 11. Change the message that the kernel prints. Rebuild and reboot in Bochs. Did it work? If not, make it work.

Yes. Yes it did.

Exercise 12. What are the register values at the breakpoint? (Use the info regs command.) What does the stack look like for 20 (decimal 32) bytes on either side of the stack pointer at the breakpoint?

<bochs:4> info registers
...
esp            0x79b0    	0x79b0    
...
<bochs:5> x/16w 0x7990
[bochs]:
0x7990 <bogus+0>:	0x00000000	0x00000000	0x00000000	0x00000000
0x79a0 <bogus+16>:	0x00000000	0x00000000	0x00000000	0x00000000
0x79b0 <bogus+32>:	0x0010004b	0x00000001	0x00000002	0x00007ce3
0x79c0 <bogus+48>:	0x00000010	0x00101e00	0x00000000	0x00007bf8
<bochs:6>

Exercise 13. Deduce the values held by each stack location near the stack pointer. Where is the return PC? Where are the function arguments? You should turn in a chart like before.

0x7990 0x00000000
       0x00000000
       0x00000000
       0x00000000
0x79a0 0x00000000
       0x00000000
       0x00000000
       0x00000000
0x79b0 0x0010004b -- return addr back to cmain
       0x00000001 -- function arg 1
       0x00000002 -- function arg 2
       0x00007ce3 -- scratch space allocated but not used yet in cmain
0x79c0 0x00000010    ...
       0x00101e00    ...
       0x00000000    ...
       0x00007bf8 -- saved ebp from cmain

Exercise 14. Armed with your stack diagram, annotate your disassemblies of cmain and f, explaining the purpose of each line.

00000000 <_f>:
  push   %ebp               save old ebp
  mov    %esp,%ebp          save old esp
  sub    $0x4,%esp          alloc some stack space
  movl   $0x0,-4(%ebp)      c = 0
  mov    0x8(%ebp),%eax     eax = a
  mov    %eax,%edx          edx = a
  shl    $0x4,%edx          edx <<= 4
  lea    -4(%ebp),%eax      eax = &c
  add    %edx,(%eax)        *eax += edx (eax += a<<4)
  mov    0xc(%ebp),%eax     eax = b
  mov    %eax,%edx          edx = b
  shl    $0x6,%edx          edx <<= 6
  lea    -4(%ebp),%eax      eax = &c
  add    %edx,(%eax)        *eax += edx (eax += b<<6)
  mov    -4(%ebp),%eax      eax = c (eax is return register)
  leave                     restore old ebp, esp
  ret    

0000002c <_cmain>:
  push   %ebp               save old ebp
  mov    %esp,%ebp          save old esp
  sub    $0x8,%esp          alloc some stack space, not much used
  sub    $0x8,%esp
  push   $0x2               push f's arg 2
  push   $0x1               push f's arg 1
  call   0 <_f>             call f
  add    $0x10,%esp         pop arguments
  mov    %eax,-4(%ebp)      save f's return value into temporary
  mov    -4(%ebp),%eax      use temporary as our return value
  leave                     restore old ebp, esp
  ret