Required reading: nami() and associated functions, sysfile.c.
why do you want names as part of your storage system?
hard for users/apps to keep track of disk addrs
directories aid scaling, avoid name conflicts
security / consistency are easier with an abstract interface
interface
pathnames: x, 828/x, /usr/rtm/828/x
directory tree with root at /
special entries in each directory: ., ..
current working directory (cwd)
cat 828/x
cd 828 ; cat x
per process
name ops: open(), chdir(), creat(), mkdir(), unlink(), link()
some implications of interface
could I implement by putting file info in directory entry?
this is how DOS / early Windows FAT file system worked
ln a b ; rm a ; cat b
fd = open("x"); unlink("x"); write(fd, "z", 1);
mkdir d ; cd d ; rmdir ../d
so underlying file has existence independent of name
thus i-nodes
a dirent contains a name and an i-number
dirent does *not* contain the file info
removing a directory
only allowed if you first remove contents
otherwise allocated but un-nameable i-nodes
can't remove a file with no name!
thus recursive rm -rf
remove directory contents (recursively)
then remove empty directory
hard links
mkdir d
ln d d/x
rm -rf d
will the rm ever finish?
how can we deal with this?
you are not allowed to hard-link to a directory
thus there can be no cycles, so rm -rf will finish
symbolic links are allowed for directories
ln -s d d/x
the symbolic link doesn't prevent removal of pointed-to file/dir
and symbolic link can be removed w/o worrying about what it points to
consequence: symlink lookups can fail
disk structure (sheet 31)
dirent contains name/i-number pairs
IBLOCK(inum) turns i-num into disk block of i-node
on-disk i-node
type is DIR, FILE, or 0 (free)
nlink is link count -- # of dir entries pointing to this i-num
this is for "ln a b ; rm a"
link count can be zero for non-free file! why?
in-core i-node structure (sheet 32)
ref is # in-core references
FDs, CWDs, some transient references in namei()
iget() / iput() (sheet 38)
I_BUSY ilock() / iunlock()
I_VALID
namei() looks up names (sheet 43)
turns a pathname into an in-core inode pointer
three uses:
read-only (open, chdir)
create -- you really want the parent directory
delete -- ""
aux functions: dirlookup() and dirlink()
used by callers during create and delete
what makes name lookup tricky?
locking
create/create race
lookup/delete race
either unlink or rmdir
is someone freeing/reallocating an inode we're using / about to use?
particularly while looking up i-number from dirent
some obvious locking schemes don't work:
namei(d1/d2/f) -> ilock(d1) ; ilock(d2) ; iunlock(d1)
no! deadlock.
namei(d/..)
or namei(./d) and namei(./..) (where CWD is d)
Plenty of complexity here just to keep naming data structures internally consistent in the face of concurrency.
Much that is useful for programs also: concurrent creates do something reasonable, your open file doesn't die just because someone renames the file, etc.
Sadly, the design is not perfect if you want to write secure privileged code.
a name can change meaning underfoot
suppose root periodically cleans up /tmp
rm /tmp/* /tmp/*/*
that's really two steps:
generate a list of all files
unlink() each one
attacker could:
mkdir /tmp/d
cp /dev/null /tmp/d/passwd
wait until root about to rm /etc/d/passwd...
rm -rf /tmp/d
ln -s /etc /tmp/d
called "time of check to time of use" (TOCTTOU) bug
they are fairly common and could be reduced by a better API
Link, unlink, chdir, mount, umount could have taken file
descriptors instead of their path argument. In fact, this would get
rid of some possible race conditions (some of which have security
implications, TOCTTOU). However, this would require that the current
working directory be remembered by the process.
Next lecture we'll think about how to maintain (really regain) consistency if there's a power failure or crash in the middle of the FS code updating the disk!