You might wonder why the address of cret ends up on the stack. The csv function puts it there when returning after saving the registers. The instruction is jsr pc,(r0).
(In what follows, you can find the files mentioned in /mit/6.828/sw on athena.)
The csv in the C library (v6/usr/source/s4/csv.s) does this explicitly:
/ C register save and restore -- version 12/74
.globl csv
.globl cret
csv:
mov r5,r0
mov sp,r5
mov r4,-(sp)
mov r3,-(sp)
mov r2,-(sp)
tst -(sp)
jmp (r0)
cret:
mov r5,r1
mov -(r1),r4
mov -(r1),r3
mov -(r1),r2
mov r5,sp
mov (sp)+,r5
rts pc
but the one we have in the kernel
(in v6/usr/sys/conf/m40.s)
has the shorter jsr pc, (r0).
This is just a way to squeeze an instruction
out of csv. The actual value pushed
is irrelevant; the code just wants to make some space.
Remember that this function is executed as part
of every C function call, so saving one instruction
might well be a significant speedup!
The V7 C library's v7/src/libc/crt/csv.s
adopted the kernel approach, along with an explanatory comment:
/ C register save and restore -- version 7/75
.globl csv
.globl cret
csv:
mov r5,r0
mov sp,r5
mov r4,-(sp)
mov r3,-(sp)
mov r2,-(sp)
jsr pc,(r0) / jsr part is sub $2,sp
cret:
mov r5,r2
mov -(r2),r4
mov -(r2),r3
mov -(r2),r2
mov r5,sp
mov (sp)+,r5
rts pc
Another interesting question is why the tst -(sp) after jsr r5, csv in the function prologue. This was just a convenient way to subtract two from the stack pointer. In fact, it's shorter, since a sub $2, sp would use an extra word of instruction for the immediate $2.
The V6 C compiler special cased this
(v6/usr/source/c/c11.c):
case SAVE:
printf("jsr r5,csv\n");
t = getw(ascbuf)-6;
if (t==2)
printf("tst -(sp)\n");
else if (t > 2)
printf("sub $%o,sp\n", t);
break;
Of course, this doesn't answer the question of why our f function allocates space that it never uses, nor does it answer the question of what -6 is in the compiler fragment above.
To answer that, we need to dig deeper into
how the compiler works.
The argument to the pseudo-op SAVE is the value
autolen computed in blkhed
in v6/usr/source/c/c02.c.
That's the size of the stack frame, effectively.
Blkhed initializes autolen to 6
and then processes the code inside the block,
which increases autolen as necessary to allocate
automatic (stack) storage for local variables.
Why does autolen start at 6?
Because -autolen is used as the offset
from r5 used to allocate a variable.
The code to allocate a new local variable does:
if (dsym->hclass==AUTO) {
autolen =+ rlength(dsym);
dsym->hoffset = -autolen;
}
So the first variable will be stored at -8(r5)
as we saw above, with f's c variable.
What are the 4 values before that? Consulting our stack
diagram we see that they are the saved
r5, r2, r3, and r4.
But wait! What about the extra stack word being allocated
in csv?
That should mean we'd only need to allocate
autolen-6-2 words after csv runs.
The answer is made clear by the disassembly of main above:
mov $2,(sp) push 2 into that temporary
mov $1,-(sp) push 1
jsr pc,*$_f call f(1,2)
tst (sp)+ pop 1 (2 needn't pop because it's in the temp)
Notice that the first push didn't have to change the stack
pointer! This is because the word was already allocated.
More significantly, tst (sp)+ only had to
pop one value off the stack.
In the common case where there is only one function argument,
we can get rid of the pop instruction entirely!
Of course, this is only a theory, but
v6/usr/source/c/c10.c supports our theory:
/*
* Handle a subroutine call. It has to be done
* here because if cexpr got called twice, the
* arguments might be compiled twice.
* There is also some fiddling so the
* first argument, in favorable circumstances,
* goes to (sp) instead of -(sp), reducing
* the amount of stack-popping.
*/
case CALL:
It's also interesting to note that popstk, which
generates the code to pop the stack, special-cased 2 words
as well as 1, to save space in the
instruction encoding
(v6/usr/source/c/c11.c):
popstk(a)
{
switch(a) {
case 0:
return;
case 2:
printf("tst (sp)+\n");
return;
case 4:
printf("cmp (sp)+,(sp)+\n");
return;
}
printf("add $%o,sp\n", a);
}
Functions with one, two, and three arguments
were all presumably common enough to warrant this treatment.
In fact, we can check the kernel sources to find out.
Here's the breakdown of statements following
a call instruction (jsr pc,...) in the kernel code:
245 tst (sp)+ pop two args
84 jmp cret
47 cmp (sp)+,(sp)+ pop three args
44 add $6,sp
38 mov r0,r4
35 tst r0
13 jsr pc,_spl0
12 mov r0,r3
...
We could bill all the cases that aren't labeled
as "pop one arg", since in that case there's
no instruction at all.
It turns out there are 873 function calls
and 581 of them had no stack pop code
because they had zero or one arguments.
Note that case SAVE above didn't do the same special-casing to allocate a stack frame of two arguments. We might expect that one-word stack frames are quite common (one temporary used to compute a return value) while if you've got more than one word you're likely to have a few, as variables. But then, many variables were kept in registers only (remember that all registers were callee-save), so maybe not. Again, we can check.
If we look at stack frame sizes by considering instructions after jsr r5,csv we find that out of 239 functions, 206 need no prologue whatsoever (they have empty stack frames), 16 use tst -(sp) (they have one-word frames), 10 use sub $4, sp (they have two-word frames), 3 have three-word frames, 3 have five-word frames, and 1 has an eight-word frame. Now you can see why leaving about 400 words for the kernel stack was plenty. So in this case, maybe it would have been reasonable to add the extra case. (It also seems it would have been reasonable to drop the tst -(sp) special case.)
As a final interesting footnote, here's the equivalent v5 stack generation code, first in the compiler (v5/usr/c/c02.c):
case LBRACE: if (d) { o2 = blkhed() - 4; if (proflg) o = "jsr\tr5,mrsave;0f;%o\n.bss\n0:.=.+2\n.text\n"; else o = "jsr r5,rsave; %o\n"; printf(o, o2); }and then the register saving routine (v5/usr/source/s4/rsave.s):
/ C register save and restore .globl rsave .globl mrsave .globl rretrn mrsave: tst (r5)+ rsave: mov r5,r0 mov sp,r5 mov r4,-(sp) mov r3,-(sp) mov r2,-(sp) sub (r0)+,sp jmp (r0) rretrn: sub $6,r5 mov r5,sp mov (sp)+,r2 mov (sp)+,r3 mov (sp)+,r4 mov (sp)+,r5 rts pcCan you figure out how it works?