Required reading: xv6
trapasm.S
,
trap.c
,
syscall.c
,
initcode.S
,
usys.S
.
Skim
vectors.S
,
lapic.c
,
ioapic.c
,
picirq.c
.
You will need to consult
IA32 System
Programming Guide chapter 5 (skip 5.7.1, 5.8.2, 5.12.2).
last week we transferred from kernel to user today: how to get from user to kernel three reasons for transitions: system calls program faults (div by zero, page fault) external device interrupts why do we need to take special care for user -> kernel? security/isolation only kernel can touch devices, MMU, FS, other process' state, &c think of user program as a potential malicious adversary remember how x86 privilege levels work CPL in low 2 bits of CS CPL=0 -> can modify cr*, devices, can use any PTE CPL=3 -> can't modify cr*, or use devs, and PTE_U enforced what has to happen? save user state for future transparent resume set up for execution in kernel (stack, segments) choose a place to execute in kernel get at system call arguments do it all securely it's neat that interrupts, faults, system call use same mechanism!
The INT instruction takes the following steps (these will be similar to all interrupts and faults, though there are slight differences):
INT is a complex instruction. Does it really need to take all those steps? Why not let the kernel interrupt handler do some of them? For example, why does INT need to save the SS and ESP?
xv6 set up the IDT in tvinit() (sheet 29) and set the IDTR in idtinit(); SETGATE is on sheet 09. switchuvm() (sheet 17) specified the SS and ESP in the TSS. print idt[0x40] to see how the IDT is set up to handle vector 0x40.
What is the current CPL? How was it set? Could the user abuse the INT instruction to exercise privilege or break the kernel?
x/6x $esp in order to see what int put on the stack. Compare to Figure 5-4. What stack is being used?
x/3i vector64. vector64 pushes a few items on the stack and then jumps to alltraps. Why not have vector 64 in the IDT point directly to alltraps?
Single-step alltraps (sheet 29) until pushl %esp, then x/19x $esp. Compare with struct trapframe sheet 06.
At the start of trap(), what is tf->trapno? How was it set? (sheet 30)
syscall() (sheet 32) dispatches to a function it finds by indexing into the syscalls array. It uses the eax from the trap frame as the index. What is in that eax? Where was it set?
Now we're in sys_exec() (sheet 56). Where are the arguments the user program originally passed to exec()? How can the kernel get at them?
sys_exec() calls argint() to get its 2nd argument (sheet 31). Argint/fetchint conditionally dereferences a user pointer. Why conditionally?
Back to sys_exec() (sheet 56). It reads the array of arguments, then calls exec() (sheet 57). It creates a new pagetable, and copies the file /init to that address space's memory. It allocates and maps a stack at the top of the user-part of the address space. Then it initializes that stack according to C conventions: put arguments on stack, argv, and argc. It sets the eip in the trapframe to elf.entry. Finally it switches to the new pagetable and frees the old one.
How is the return value from sys_exec() returned to user space? (syscall() sheet 32)
single-step until iret, x/5x $esp, single-step into user space. Print the registers and stack. What will the return value to the original call to open() be?
What would happen if a user program divided by zero? What if kernel code divided by zero?
In Unix, traps often get translated into signals to the process. Some traps, though, are (partially) handled internally by the kernel -- which ones?
Some traps push an extra error code onto the stack (typically containing the segment descriptor that caused a fault). But this error code isn't pushed by the INT instruction. Can the user confuse the kernel by invoking INT 0xc (or any other vector that usually pushes an error code)? Why not?
Like system calls, except: devices generate them at any time, there are no arguments in CPU registers, nothing to return to, usually can't ignore them. There is hardware on the motherboard to signal the CPU when a device needs attention (e.g. the user has typed a character on the keyboard). There's usually a separate vector for each device. Let's look at the timer interrupt; the timer hardware generates an interrupt 100 times per second so that the kernel can track the passage of time and so the kernel can time-slice among multiple running processes. The timer interrupts through vector 32.
p idt[32], then set a breakpoint at vector32
x/20x $esp. What was the CPU doing at the time of the interrupt? What stack is being used?
The interrupt will have pushed different numbers of words on the stack depending on whether the CPU was in user-space or the kernel; how does iret know how many words to pop?
What prevents lots of interrupts from coming in all at once and overflowing the kernel stack? Print the registers; IF=0x200. p idt[32], p idt[64].
trap(), when it's called for a time interrupt, does just two things: increment the ticks variable, and call wakeup. At the end of trap, xv6 calls yield. as we will see, may cause the interrupt to return in a different process!
XXX Turns out our kernel had a subtle security bug in the way it handled traps... vb 0x1b:0x11, run movdsgs, step over breakpoints that aren't mov ax, ds, dump_cpu and single-step. dump_cpu after mov gs, then vb 0x1b:0x21 to break after sbrk returns, dump_cpu again.
Since JOS does not use segmentation, where do traps vector in JOS?
JOS also has a very different kernel architecture: only one kernel stack, as opposed to one per process in xv6. The kernel is not re-entrant (cannot be interrupted), so all IDT entries are interrupt gates in JOS.