6.1810 2022 Lecture 2: programming xv6 in C why C? good for low-level programming easy mapping between C and RISC-V instructions easy mapping between C types and hardware structures e.g.., set bit flags in hardware registers of a device minimal runtime easy to port to another hardware platform direct access to hardware explicit memory management no garbage collector kernel is in complete control of memory management efficient: compiled (no interpreter) compiler compiles C to assembly popular for building kernels, system software, etc. good support for C on almost any platform why not? easy to write incorrect code easy to write code that has security vulnerabilities today's lecture: use of C in xv6 memory layout pointers arrays strings lists bitwise operators [not a general intro to C] memory layout of a C program in xv6 [draw figure, see fig 3.4 of text] text: code, read-only data data: global C variables stack: function's local variables heap: dynamic memory allocation using sbrk, malloc/free example: compile cat.c Makefile defines how gcc compiles to .o ld links .o files into an executable ulibc.o is xv6 minimal C library executable has a.out format with sections for: text (code), initialized data, symbol table, debug info, and more explore a.out of _cat riscv64-linux-gnu-objdump -S user/_cat same as user/cat.asm 0x0: cat what if we run two cat programs at the same time? see pgtbl lecture 0x8e: _main user.ld: entry: _main what is _main? defined in ulib.c, which calls main() and exit(0) where is data memory? (e.g., buf) in data/bss segment must be setup by kernel but we know address where buf should be riscv64-linux-gnu-nm -n user/_cat C pointers a pointer is a memory address every variable has a memory address (i.e., p = &i) so each variable can be accessed through its pointer (i.e., *i) a pointer can be variable (e.g., int *p) and thus has a memory address, etc. pointer arithmetic char *c; int *i; what is the value of c+1 and i+1? referencing elements of a struct struct { int a; int b; } *p; p->a = 10 [demo ptr.c] C arrays contiguous memory holding same data type (char, int, etc.) no bound checking, no growing two ways to access arrays: through index: buf[0], buf[1] through pointer: *buf, *(buf+1) [demo array.c] C strings arrays of characters, ending in 0 [demo str.c] ulib.c has several functions for strings strlen() --- use array access strcmp() --- use pointer access ls.c argv: array of strings each entry has the address of a string xv6's exec puts them on the stack as arguments to main print out argv [draw diagram; see fig 3.4 in book] T_DIR code fragment mkdir d echo hi > d/f ls d C lists (more pointers) single-linked list kernel/kalloc.c implements a memory allocator keeps a list of free "pages" of memory a page is 4096 bytes free prepends kalloc grabs from front of list double-linked list kernel/bio.c implements an LRU buffer cache brelse() needs to move a buf to the front of the list see buf.h two pointers: prev and next bitwise operators char/int/longs/pointers have bits (8, 32, 64 respectively, on RISC-V). you can manipulate them with |, &, ~, ^ 10001 & 10000 = 10000 10001 | 10000 = 10001 10001 ^ 10000 = 00001 ~1000 = 0111 example: user/usertests.c kernel/fcntl.h kernel/sysfile.c more interesting examples later keywords: static: to make a variable's visibility limited to the file it is declared but global within the file void: "no type", "no value", "no parameters" common C bugs use after free double free uninitialized memory memory on stack or returned by malloc are not zero buffer overflow write beyond end of array memory leak type confusion wrong type cast References: https://blog.regehr.org/archives/1393