Digression on v6 calling conventions

Digression about V6 calling conventions

You might wonder why the address of cret ends up on the stack. The csv function puts it there when returning after saving the registers. The instruction is jsr pc,(r0).

(In what follows, you can find the files mentioned in /mit/6.828/sw on athena.)

The csv in the C library (v6/usr/source/s4/csv.s) does this explicitly:

/ C register save and restore -- version 12/74

.globl	csv
.globl	cret

csv:
	mov	r5,r0
	mov	sp,r5
	mov	r4,-(sp)
	mov	r3,-(sp)
	mov	r2,-(sp)
	tst	-(sp)
	jmp	(r0)

cret:
	mov	r5,r1
	mov	-(r1),r4
	mov	-(r1),r3
	mov	-(r1),r2
	mov	r5,sp
	mov	(sp)+,r5
	rts	pc

but the one we have in the kernel (in v6/usr/sys/conf/m40.s) has the shorter jsr pc, (r0). This is just a way to squeeze an instruction out of csv. The actual value pushed is irrelevant; the code just wants to make some space. Remember that this function is executed as part of every C function call, so saving one instruction might well be a significant speedup!

The V7 C library's v7/src/libc/crt/csv.s adopted the kernel approach, along with an explanatory comment:

/ C register save and restore -- version 7/75

.globl	csv
.globl	cret

csv:
	mov	r5,r0
	mov	sp,r5
	mov	r4,-(sp)
	mov	r3,-(sp)
	mov	r2,-(sp)
	jsr	pc,(r0)		/ jsr part is sub $2,sp

cret:
	mov	r5,r2
	mov	-(r2),r4
	mov	-(r2),r3
	mov	-(r2),r2
	mov	r5,sp
	mov	(sp)+,r5
	rts	pc

Another interesting question is why the tst -(sp) after jsr r5, csv in the function prologue. This was just a convenient way to subtract two from the stack pointer. In fact, it's shorter, since a sub $2, sp would use an extra word of instruction for the immediate $2.

The V6 C compiler special cased this (v6/usr/source/c/c11.c):

	case SAVE:
		printf("jsr	r5,csv\n");
		t = getw(ascbuf)-6;
		if (t==2)
			printf("tst	-(sp)\n");
		else if (t > 2)
			printf("sub	$%o,sp\n", t);
		break;

Of course, this doesn't answer the question of why our f function allocates space that it never uses, nor does it answer the question of what -6 is in the compiler fragment above.

To answer that, we need to dig deeper into how the compiler works. The argument to the pseudo-op SAVE is the value autolen computed in blkhed in v6/usr/source/c/c02.c. That's the size of the stack frame, effectively. Blkhed initializes autolen to 6 and then processes the code inside the block, which increases autolen as necessary to allocate automatic (stack) storage for local variables. Why does autolen start at 6? Because -autolen is used as the offset from r5 used to allocate a variable. The code to allocate a new local variable does:

	if (dsym->hclass==AUTO) {
		autolen =+ rlength(dsym);
		dsym->hoffset = -autolen;
	}

So the first variable will be stored at -8(r5) as we saw above, with f's c variable. What are the 4 values before that? Consulting our stack diagram we see that they are the saved r5, r2, r3, and r4.

But wait! What about the extra stack word being allocated in csv? That should mean we'd only need to allocate autolen-6-2 words after csv runs. The answer is made clear by the disassembly of main above:

  mov $2,(sp)       push 2 into that temporary
  mov $1,-(sp)      push 1
  jsr pc,*$_f       call f(1,2)
  tst (sp)+         pop 1 (2 needn't pop because it's in the temp)

Notice that the first push didn't have to change the stack pointer! This is because the word was already allocated. More significantly, tst (sp)+ only had to pop one value off the stack. In the common case where there is only one function argument, we can get rid of the pop instruction entirely! Of course, this is only a theory, but v6/usr/source/c/c10.c supports our theory:

	/*
	 * Handle a subroutine call. It has to be done
	 * here because if cexpr got called twice, the
	 * arguments might be compiled twice.
	 * There is also some fiddling so the
	 * first argument, in favorable circumstances,
	 * goes to (sp) instead of -(sp), reducing
	 * the amount of stack-popping.
	 */
	case CALL:

It's also interesting to note that popstk, which generates the code to pop the stack, special-cased 2 words as well as 1, to save space in the instruction encoding (v6/usr/source/c/c11.c):

popstk(a)
{
	switch(a) {

	case 0:
		return;

	case 2:
		printf("tst	(sp)+\n");
		return;

	case 4:
		printf("cmp	(sp)+,(sp)+\n");
		return;
	}
	printf("add	$%o,sp\n", a);
}

Functions with one, two, and three arguments were all presumably common enough to warrant this treatment. In fact, we can check the kernel sources to find out. Here's the breakdown of statements following a call instruction (jsr pc,...) in the kernel code:

 245 tst	(sp)+          pop two args
  84 jmp	cret
  47 cmp	(sp)+,(sp)+    pop three args
  44 add	$6,sp
  38 mov	r0,r4
  35 tst	r0
  13 jsr	pc,_spl0
  12 mov	r0,r3
 ...

We could bill all the cases that aren't labeled as "pop one arg", since in that case there's no instruction at all. It turns out there are 873 function calls and 581 of them had no stack pop code because they had zero or one arguments.

Note that case SAVE above didn't do the same special-casing to allocate a stack frame of two arguments. We might expect that one-word stack frames are quite common (one temporary used to compute a return value) while if you've got more than one word you're likely to have a few, as variables. But then, many variables were kept in registers only (remember that all registers were callee-save), so maybe not. Again, we can check.

If we look at stack frame sizes by considering instructions after jsr r5,csv we find that out of 239 functions, 206 need no prologue whatsoever (they have empty stack frames), 16 use tst -(sp) (they have one-word frames), 10 use sub $4, sp (they have two-word frames), 3 have three-word frames, 3 have five-word frames, and 1 has an eight-word frame. Now you can see why leaving about 400 words for the kernel stack was plenty. So in this case, maybe it would have been reasonable to add the extra case. (It also seems it would have been reasonable to drop the tst -(sp) special case.)

As a final interesting footnote, here's the equivalent v5 stack generation code, first in the compiler (v5/usr/c/c02.c):

	case LBRACE:
		if (d) {
			o2 = blkhed() - 4;
			if (proflg)
				o = "jsr\tr5,mrsave;0f;%o\n.bss\n0:.=.+2\n.text\n";
			else
				o = "jsr	r5,rsave; %o\n";
			printf(o, o2);
		}

and then the register saving routine (v5/usr/source/s4/rsave.s):

/ C register save and restore

.globl	rsave
.globl	mrsave
.globl	rretrn

mrsave:
	tst	(r5)+

rsave:
	mov	r5,r0
	mov	sp,r5
	mov	r4,-(sp)
	mov	r3,-(sp)
	mov	r2,-(sp)
	sub	(r0)+,sp
	jmp	(r0)

rretrn:
	sub	$6,r5
	mov	r5,sp
	mov	(sp)+,r2
	mov	(sp)+,r3
	mov	(sp)+,r4
	mov	(sp)+,r5
	rts	pc

Can you figure out how it works?