File systems

Required reading: Chapter 18, 19, and chapter 20 (from "Resource Allocation", pp. 20-4)

Overview

Users desire to store their data persistently so that data survives when the user turns of his computer. The primary media for doing so are: magnetic disks, flash memory, and tapes. We focus on magnetic disks (e.g., the RK in v6).

To help users to remember where they stored their data, most systems allow users to assign their own names to their data. Typically the data is organized in files and users assign names to files. To deal with many files, users can organize their files in directories, in a hierarchical manner. To avoid that users have to type long abolute names (e.g., in UNIX absolute names start with "/"), users can change their working directory and use relative names .

In v6, File namespace operations include create, link, unlink, and chdir; later UNIXes added rename, mkdir, etc. (How is "mv a b" implemented in v6? Answer: "link a b"; "unlink "a".) To be able to name the current directory and the parent directory every directory includes two entries "." and "..".

A user grafts new file systems on a name space using mount. Umount removes a file system from the name space. (In DOS, a file system is named by its device letter.)

The data in a file can be organized in a structured way or not. The structured variant is often called a database. UNIX uses the unstructured variant: files are streams of bytes. Any particular structure is likely to be useful to only a small class of applications, and other applications will have to work hard to fit their data into one of the pre-defined structures. Besides, if you want structure, you can easily write a user-mode library program that imposes that format on any file. The end-to-end argument in action.

A key issue is the semantics of an update. If a program updates a file, when are the effects of the update persistent? In particular, what happens if the logical update requires writing multiple disks blocks and the power fails during the update. Can it happen that an update happens only partially? To solve this problem correctly one needs atomic operations with respect to failures.

Another key issue is what the semantics are of concurrent writes to the same data item. What is the order of two updates that happen at the same time? To solve this problem correctly one needs atomic operations with respect to concurrency.

In v6, a file is a byte stream. The v6 file calls are open, read, write, seek, close, and stat. Dup duplicates a file descriptor. Data for a file is reclaimed if it doesn't appear in the name space anymore. Each file descriptor has a file offset. Read/write move it implicitly, seek moves it explicitly. Two processes that share a file because one process is the child (and inherited an open file descriptor from its parent) will share an offset pointer into the file: their bytes will be ordered serially in some order; in practice, one write after the other. Two processes that open the same file will each have their own offset into the file and may overwrite each other's data.

A final key issue is performance, because writing to magnetic disk is relatively expensive compared to computing. (This is true for the PDP11/RK6 too.) Two primary ways to improve performance are: careful file system layout that induces few seeks and overlap I/O with computation so that file operations don't have to wait until their completion and so that that the disk driver has more data to write, which allows disk scheduling etc. An additional method is to cheat on the semantics of modifications, and give up on atomicity.

V6 follows the last two methods. It doesn't pay attention to file system layout. It overlaps computation and I/O, but doesn't do any disk scheduling. It cheats on semantics: when one closes a file, some of its data may not have been written to disk yet, while other blocks may have been written---v6 is more careful with i-nodes (probably, mostly accidently). Later in some of the papers, we will see how one can do better.

V6 code examples

On disk files are represented by an inode (ino.h), and blocks. Small files have 8 direct pointers; in large files the eight pointers are pointers to a block of pointers (see bmap, 6415). Files are limited to 2^15 blocks. Directories are files with structure: 14-byte name plus 2-byte inode number. In memory files are represented by a file structure (struct file) and an in-core inode (inode.h).