Naming in file systems

Required reading: nami() and associated functions, sysfile.c.

why do you want names as part of your storage system?
  hard for users/apps to keep track of disk addrs
  directories aid scaling, avoid name conflicts
  security / consistency are easier with an abstract interface
interface
  pathnames: x, 828/x, /usr/rtm/828/x
  directory tree with root at /
  special entries in each directory: ., ..
  current working directory (cwd)
    cat 828/x
    cd 828 ; cat x
    per process
  name ops: open(), chdir(), creat(), mkdir(), unlink(), link()
some implications of interface
  could I implement by putting file info in directory entry?
    this is how DOS / early Windows FAT file system worked
  ln a b ; rm a ; cat b
  fd = open("x"); unlink("x"); write(fd, "z", 1);
  mkdir d ; cd d ; rmdir ../d
  so underlying file has existence independent of name
  thus i-nodes
  a dirent contains a name and an i-number
    dirent does *not* contain the file info
removing a directory
  only allowed if you first remove contents
    otherwise allocated but un-nameable i-nodes
    can't remove a file with no name!
  thus recursive rm -rf
    remove directory contents (recursively)
    then remove empty directory
hard links
  mkdir d
  ln d d/x
  rm -rf d
  will the rm ever finish?
  how can we deal with this?
    you are not allowed to hard-link to a directory
    thus there can be no cycles, so rm -rf will finish
symbolic links are allowed for directories
  ln -s d d/x
  the symbolic link doesn't prevent removal of pointed-to file/dir
  and symbolic link can be removed w/o worrying about what it points to
  consequence: symlink lookups can fail

xv6 file system naming implementation

disk structure (sheet 31)
  dirent contains name/i-number pairs
  IBLOCK(inum) turns i-num into disk block of i-node
on-disk i-node
  type is DIR, FILE, or 0 (free)
  nlink is link count -- # of dir entries pointing to this i-num
    this is for "ln a b ; rm a" 
    link count can be zero for non-free file! why?
in-core i-node structure (sheet 32)
  ref is # in-core references
    FDs, CWDs, some transient references in namei()
    iget() / iput() (sheet 38)
  I_BUSY ilock() / iunlock()
  I_VALID
namei() looks up names (sheet 43)
  turns a pathname into an in-core inode pointer
  three uses:
    read-only (open, chdir)
    create -- you really want the parent directory
    delete -- ""
  aux functions: dirlookup() and dirlink()
    used by callers during create and delete
what makes name lookup tricky?
  locking
  create/create race
  lookup/delete race
    either unlink or rmdir
    is someone freeing/reallocating an inode we're using / about to use?
    particularly while looking up i-number from dirent
  some obvious locking schemes don't work:
    namei(d1/d2/f) -> ilock(d1) ; ilock(d2) ; iunlock(d1)
    no! deadlock.
    namei(d/..)
    or namei(./d) and namei(./..) (where CWD is d)

V6 code examples

Questions:

Plenty of complexity here just to keep naming data structures internally consistent in the face of concurrency.

Much that is useful for programs also: concurrent creates do something reasonable, your open file doesn't die just because someone renames the file, etc.

Sadly, the design is not perfect if you want to write secure privileged code.

  a name can change meaning underfoot
  suppose root periodically cleans up /tmp
    rm /tmp/* /tmp/*/*
  that's really two steps:
    generate a list of all files
    unlink() each one
  attacker could:
    mkdir /tmp/d
    cp /dev/null /tmp/d/passwd
    wait until root about to rm /etc/d/passwd...
    rm -rf /tmp/d
    ln -s /etc /tmp/d
  called "time of check to time of use" (TOCTTOU) bug
    they are fairly common and could be reduced by a better API
Link, unlink, chdir, mount, umount could have taken file descriptors instead of their path argument. In fact, this would get rid of some possible race conditions (some of which have security implications, TOCTTOU). However, this would require that the current working directory be remembered by the process.

Next lecture we'll think about how to maintain (really regain) consistency if there's a power failure or crash in the middle of the FS code updating the disk!