| 6.097: OPERATING SYSTEM ENGINEERING |
| Fall 2002 |
| Lab 1 |
| Hand out date: Monday September 9th |
| Due date: Thursday September 19th |
Before we get to the bootloader, we need to be familiar with the development environment and the tools we'll be using.
This class assumes you will be working on Athena, where we have built the various necessary tools. In order to access these tools, you should run:
athena% add gnu 6.097 sipb
You can work on Suns or Linux PCs, but not on SGI/Irix machines. It is possible to build these tools for other systems; if you absolutely can not work on Athena, contact one of the TAs.
You should also download the code for the lab and untar it into your working directory. For example,
athena% mkdir ~/6.097 athena% cd ~/6.097 athena% wget http://pdos.lcs.mit.edu/6.097/labs/bootloader/bootloader.tar.gz athena% gtar -zxf bootloader.tar.gz
Developing an operating system on real hardware can be difficult and time-consuming. Instead we'll be using a freely available PC emulator called Bochs. The operating system you'll develop is in some sense just a toy operating system, but it will be "real" enough to run on actual PC hardware. We're just using Bochs to make development easy.
Bochs emulates the x86 instruction set and the devices which are common to an IBM PC, such as the keyboard, IDE hard drive, VGA monitor, interrupt controller, and a timer chip, among other devices. In functionality, Bochs is similar to VMWare, though it operates in a very different manner.
We will begin by showing you how to boot Bochs with a Linux image we
have provided. In your bootloader/linux directory, there
is a Bochs configuration file called .bochsrc and a disk
image file called linux.img. Bochs reads the
.bochsrc to configure itself; one of the configuration
instructions is to treat the file linux.img as the
primary drive of the virtual computer.
To run Bochs, simply enter change into the provided
linux directory and run:
athena% cd ~/6.097/bootloader/linux athena% bochs
When Bochs first starts, hit enter 3 times so that Bochs loads its
configuration from our provided .bochsrc file and starts
running. Bochs immediately dumps you to the debugger. Type 'c' and
enter, to continue running Bochs. You should see Linux boot.
If you are curious, more information is available on the Bochs home page. There's a good User Guide, though it contains a lot more information than you really need to know.
You do not need to turn anything in for this part of the lab.
If you are not already familiar with x86 assembly the PC Assembly Language Book is an excellent place to start. Hopefully, the book contains mixture of new and old material for you. We recommend reading the entire book, except you should skip all sections after 1.3.5 in chapter 1, and you can also skip chapters 5 and 6, and all sections under 7.2.
Warning: Unfortunately the examples in the book are written for the NASM assembler, whereas we will be using the GNU assembler. NASM uses the so-called Intel syntax while GNU uses the AT&T syntax. While semantically equivalent, an assembly file will differ quite a lot, at least superficially, depending on which syntax is used. Luckily the conversion between the two is pretty simple, and is covered in this reference. Only read the section "The Syntax".
Surely the definitive reference is Intel's instruction set architecture reference.
You should read the recommended chapters of the PC assembly book, and "The Syntax" section of the second reference now. Save the Intel document for later.
You do not need to turn anything in for this part of the lab.
bootloader/src require links:
locore.S,
and
GNUmakefile.
We don't award web page style points. Just make a functional web page, and devote your time to the lab exercises.
For this exercise you're going to learn a bit about Boch's debugger and the PC bootup process. Run the following commands:
athena% cd ~/6.097/bootloader/src/ athena% gmake ex1
Later, we'll examine the output of gmake in detail. But for now, it suffices to know that this compiles ex1.S and produces a Bochs disk image file named disk.img.
In this same directory there is a .bochsrc. If you examine it you'll see that it contains the line "diskc: file=./disk.img ...". Therefore, Bochs is all setup to run with this disk image.
When you start Bochs, you will be dumped into the debugger. The debugger's output format is:
Next at t=[instruction #] (0) CS:EIP: machine code: disassembled instructionStart Bochs, but this time don't type 'c', just examine the output. You should see:
Next at t=0 (0) f000:fff0: e968e0: jmp +#e068 <bochs:1>Sidenote: (most of) these values are in hex. Throughout this class, we'll follow the convention that hex numbers are written starting with "0x". (As you can see here, Bochs doesn't follow this convention). If you are unfamiliar with hex notation it's covered in Chapter 1 of the PC assembly book. From this you can conclude a few things:
jmp will be backwards--to a
lower EIP.Why does the Bochs start like this? This is how IBM decided PCs would work. The PC hardware loads BIOS into memory in the address range 0xf0000 - 0xfffff. It sets CS to 0xf000, the EIP to 0xfff0, and consequently, execution begins at that (CS:EIP) segment address. But what memory location does this CS:EIP correspond to?
To answer that we need to know a bit out real mode addressing. In real mode (the mode that PC starts off in) address translation works according to the formula: ADDRESS = 16 * SELECTOR + OFFSET. So, when the IBM PC sets CS to 0xf000 and EIP to 0xfff0, the memory address referenced is:
16 * 0xf000 + 0xfff0 # in hex multiplication by 16 is = 0xf0000 + 0xfff0 # easy--just append a 0. = 0xffff0
Now we can see that the PC starts executing 16 bytes from the end of the BIOS code. Therefore we shouldn't be surprised that the first thing that the BIOS does is jmp backwards; after all how much could it accomplish in just 16 bytes? When the BIOS runs, it sets up an interrupt descriptor table, initializes devices and searches for a bootable device (e.g., a floppy, hard drive, CDROM). When it finds a bootable devices, the BIOS loads the first sector (512 bytes) into memory at address 0x7c00 - 0x7d00 and sets CS:EIP to 0000:7c00. Like the BIOS load address, these addresses are fairly arbitrary--but they are fixed.
Sidenote: disks are divided up in to 512 byte regions called sectors. This is the disk's minimum transfer granularity. Each read/write operation must be one or more sectors in size. If the disk is bootable, the first sector is also called the boot sector, since this where the boot strapping code resides.
Type b 0x7c00 into Bochs. This instructs Bochs to break at
address 0x7c00. The hard drive image has been created so that our
assembly program ex1.S is located in this first sector. After
setting the break point, type 'c', and Bochs will continue through
all the BIOS code and stop at your break point--ie. the first
instruction in ex1.S.
When the debugger stops, type 's'. This causes the debugger to
single-step one instruction at a time. Do this a few times. Make
sure you understand what is going on and how it corresponds to the
source code in ex1.S.
Answer the follow questions:
jmp
start above? Write out the bytes in hex. Refer to the x86 instruction set manual. Explain
each of the bytes.jmp instruction
relative to the instruction counter at the beginning or the end of the
jmp? How do you know this?Similarly, the BIOS loads the boot sector at address 0x7c00. So this is the boot sector's load address.
The link address is the address for which a binary is linked. Essentially, linking a binary for a given link address, prepares it to be loaded at that address. This link address becomes subtly encoded in the binary in a multitude of ways, with the result that if a binary is not loaded at the address that it is linked for, things will likely not work.
In one sentence: the link address is the location where a binary assumes it is going to be loaded.
When our GNUmakefile builds a disk image, such as
disk.img it compiles the code ex2.S and
places it in the boot sector (the first 512 bytes of the disk
image). As part of this compilation it also links the code with link
address 0x7c00--matching the address where the BIOS will load it at
run-time.
The file ex2.S illustrates the vague "link address becomes
subtly encoded in the binary" claim above. The load versus link
issue should be clear after working through the concrete example in
ex2.S. Type gmake ex2 and restart Bochs.
Answer these questions:
ex1.S would still run correctly even if
the BIOS did BOOTLOADER_LINK_ADDRESS parameter in the
GNUmakefile. Explain how this value affects the value of EAX loaded
by "movl $here,%eax".BOOTLOADER_LINK_ADDRESS to
0x7c00 and rerun gmake ex2.
Now it's time to examine the steps to build a bochs disk image
closely. It is important to understand what is going on and why each
step is important. For this exercise we'll be using the code
ex2.S again (ie. you don't need to run gmake).
First, a handy tool for you to use is xxd. It dumps a
binary file as hex:
athena% xxd -a disk.img 0000000: 9090 9090 66b8 007c 0000 66ff e08d 7400 ....f..|..f...t. 0000010: 0000 0000 0000 0000 0000 0000 0000 0000 ................ * 0176ff0: 0000 0000 0000 0000 0000 0000 0000 0000 ................
The '*' means that every thing in 0000010-0176ff0 is
identical.
Combine this with the source file ex2.S and perhaps single stepping
through Bochs to answer the following questions:
ex2.S corresponds to the bytes in the
disk.img file. The first to lines have been filled in for
you.
instruction machine code byte start length nop 90 0 1 nop 90 1 1 nop ? ? ? nop ? ? ? movl $here,%eax ? ? ? jmp *%eax ? ? ?
BOOTLOADER_LINK_ADDRESS in the GNUmakefile and
describe how the disk image changes. Could the load address of the
boot sector affect contents of the disk image? Why or why not? Now
do you see why the link address should match the load address?gmake ex2. Paste the output in
your answer and write a short commentary next to each command. Why is
the command necessary? What is it doing? What is the input file? What
is the output file? For certain steps of the build process you'll just
have to guess.BOOTLOADER_LINK_ADDRESS to its original
value.
ex4.S--it gives an example of how
to setup the stack. Type gmake ex4.
To answer this question, you might like to use the 'x' command in the Bochs debugger. In the example below, we print out 10 words starting at address 0x7c00. And then we print out 16 bytes starting at 0x7c00.
Warning: The size of a word is not a universal standard. To Bochs, a word is four bytes. In GNU assembly, a word is two bytes (xorw -- 'w'ord means 2 bytes).
<bochs:19> x /10w 0x7c00 [bochs]: 0x7c00 <bogus+0>: 0x8e66c031 0x00bc66d0 0x6600007c 0x000003e8 0x7c10 <bogus+16>: 0xfdeb6600 0x9090c366 0x00000000 0x00000000 0x7c20 <bogus+32>: 0x00000000 0x00000000 <bochs:20> x /16b 0x7c00 [bochs]: 0x7c00 <bogus+0>: 0x31 0xc0 0x66 0x8e 0xd0 0x66 0xbc 0x00 0x7c08 <bogus+8>: 0x7c 0x00 0x00 0x66 0xe8 0x03 0x00 0x00 <bochs:21>
Answer these questions:
ex4.S would not run if the code was
loaded at a different address. Why is this? (hint: the
call and ret instructions aren't the
problem)
The files bootloader_c.c and
bootloader_asm.S contain a working boot loader. Your job
is to first read and understand the code.
To make sense out of it you'll need to know what an a.out binary is. a.out is the format of a compiled executable program which is ready to run. An a.out binary starts with a header, followed by 2 sections: text and data. The text holds the program's machine code. The data section holds the programs (initialized) data.
file offset:
0: +-----------+
| header | (length 32 bytes)
32: +-----------+
| |
| text | (length hdr.a_text)
| |
32 + hdr.a_txt: +-----------+
| |
| data | (length hdr.a_data)
| |
+-----------+
a.out binary format
struct a_out_header {
unsigned long a_midmag; /* flags<<26 | mid<<16 | magic */
unsigned long a_text; /* text segment size */
unsigned long a_data; /* initialized data size */
unsigned long a_bss; /* uninitialized data size */
unsigned long a_syms; /* symbol table size */
unsigned long a_entry; /* entry point */
unsigned long a_trsize; /* text relocation size */
unsigned long a_drsize; /* data relocation size */
}
Of these fields, the bootloader only uses a_text,
a_data, and a_entry. The meaning of the
first two should be clear from the diagrams above.
a_entry is the link address of the binary.
The file kernel.S is used to simulate a kernel. The
bootloader is going to load this into memory, pretending that it is
the kernel. Then it is going to jump to it. kernel.S is
written so that it tests if it was loaded correctly or not; it prints
"Ok" or "Bug" in the top left of the screen. You must look carefully
because the text over writes the existing text on the creen. It will
be in the very top, left.
You should now edit the GNUmakefile. The label
%.img: is used to build a Bochs disk image which contains
the boot loader in the first sector, and the kernel a.out file from
the second sector onward. The last dd command under
%.img: is incomplete. Replace the questions marks with
the proper values, so that the kernel image get created as just
described.
It's up to you, but you probably want to set the block size field
(i.e., bs=) as large as possible. Otherwise, it takes a
long time to build the disk images.
Build the image with gmake ex5 and then run
bochs with this image and make sure you see "Ok" in the
top left of the screen. Do not continue until you see this. The "Ok"
over writes the existing text, so look carefully.
Below is a chart of the hard disk image. Showing the sector number, contents of the sector, NREADS, and READER. Fill in NREADS with the number of times the particular sector is read. Fill in READER with the agent who reads it, either PC hardware, BIOS, bootloader or other.
SECTOR # CONTENTS NREADS READER 1 boot loader ? ? 2 bytes 0 - 511 of kernel.aout ? ? 3 bytes 512 - 1023 of kernel.aout ? ? . ? ? . ? ? N the last 512 byte of kernel.aout ? ? N+1 zeros ? ? N+2 zeros ? ? . ? ? . ? ? M zeros ? ? [this chart is to be turned-in.]Answer these questions:
xxd -a kernel.aout. In your answer show the
relevant portion of xxd's output and indicate where the follow
fields are located: a_text, a_data and
a_entry. Give the values of these fields in decimal,
except for a_entry which should be given in hex.
read_sector loaded? Set a breakpoint here and run it in bochs.
Hint use objdump:
athena$ i386-osclass-aout-objdump --adjust-vma=0x7c00 -S bootloader.aout.dbg(For your own purposes you might want to play around
objdump, try
out it's various flags, run it on bootloader.aout, etc. )
rep prefix.
Exercise 5 established that the bootloader does not load the kernel at
its link address. In that exercise we've also seen a kernel
(kernel.S) that can still run, despite this. This code
is called position independent (PIC), meaning that it can be loaded at
any location in memory and executed there.
By writing the assembly ourselves, we can ensure position independence
by avoiding absolute address references. ex2.S shows a
counter example: the absolute address of the here label
is moved into a register. Therefore, this code is not position
independent.
When using C, we as programmers can't control which instruction the C compiler will generate. And, in fact, a C compiler typically will produce position dependent code. Therefore, since we intend to write the bulk of our kernel in C, we need to find a work around.
Before we find the work around, insert the following line at the top locore.S:
#define ANSWER 0xdeadbeaf
Then type
athena% gmake ex6
athena% i386-osclass-aout-objdump --adjust-vma=0xf0100020 -S kernel.aout.dbg
Now answer question 1. It's listed at the end of this exercise. You
might wish to pass the C compiler the "-save-temps" flag when
compiling the file kernel2.c. This will produce the file
kernel2.s which shows the assembly instructions that the
compiler generated from its input C file.
Now, back to our "work around". For our work around, we are going to use the segmentation hardware to run position dependent C code, despite differing link/loaded addresses. You task is to configure the segmentation hardware to provide the illusion that the kernel was loaded at its link address. The task has been almost done for you, there is just one thing left.
You must edit locore.S and change the string "ANSWER" to
the correct value. Type gmake ex6, and run Bochs. If you see
"Hello World!" in the upper left corner of the screen, you know you've
got the correct value. Again, you must look carefully
because the text over writes the existing text on the scrren.
This is tricky and requires a bit of thinking. The hint is that ANSWER will involve one or more of the following terms/symbols: +,-,*,/,&,|,0xF0100020,0x7c00,KERNBASE.
You might want to consult section 3.4 of IA-32 Intel Architecture Software Developer's Manual, Volume 3: System programming guide. The solution does not require using any esoteric features, just basic segmentation, though in a non-obvious way.
Answer the follow questions:
Identify the two places in the assembly dump where the C compiler
generated position dependent code when it compiled
kernel2.c (ignore instructions corresponding to
locore.S). Look for places in the machine code where the
link address has been compiled into the binary. Hint: one of the
places is in write_char() and the other is in
i386_init(). In your answer, paste the two occurrences
of position dependent assembly from the dump, explain how they are
position dependent and to what C code they correspond.
video in kernel2.c is set
to the value it is set to. You might want to compare the code in
kernel.S.
This completes the lab.