10/26 --- File Systems: network file systems

- Goal: allow users on different computers to share files, like
  multiple users on a single computer can share files.

- Approach: make file access location transparent:

	    /afs/u/kaashoek/README.txt
 
  means on each machine the same thing.

- Making file access location transparents involves:

  - Adding names of remote files to client's name space:

    Import (Plan 9)
    Attach (AFS)
    Mounting (NFS)

    NFS:
	- two files:
	   1. on server exports file (lists exported files and their name)
           2. on local server (storing which file systems are stored where)

  - Remote file access (e.g., NFS3 and 9P):
    - Retrieve directories from remote location
    - Retrieve inodes from remote location
    - Retrieve file blocks from remote location

- Organization:
  - client/server
    - multiple clients, one server  (e.g., NFS, AFS)
     cache consistency problem
       a client reads a remote block x from server s, modifies it, 
       another client also reads x from server s---what does it see?
     crash consistency problem
       a client reads a remote block x from server s, modifies it,
       crashes, another clients also reads x from server s---what does
       it see?
    - multiple clients, multiple servers 
     load balancing problem
       which clients talks to which server?
  - peer-to-peer (e.g., xFS and Bayou)
    - every client is a server

- Cache consistency
  
  - The network file system should behave the same as when multiple
    users share a local file system:
     - read returns the result of the last write
    
  - What does that mean in a file system?

     Case 1:
	    C1			C2
				ls -l x
	    chmod +x x
	    ls -l x		ls -l x	
		
 	what does "ls x" on C2 return?  Ideal: same as the second ls on C1

     Case 2:
	    C1			C2:
	    open f		
	    read f		
	    write f
	    close f		
	    open f		open f
	    read f		read f

	 what does "read f" on C2 return? Ideal: same as the second read
	 on C2?

      Case 3:
	   C1			C2:
				open f
				read f
				close f
	   open f		
	   read f		open f
	   write f		
				read f

          what does "read f" on C2 return?  Ideal: result of last
          write to f on C1.

      Case 4:
	   C1			C2:
	   open f		
	   read f1,f2		open f
	   write f1		read f1,f2
	   write f2		

          what does "read f" on C2 return?  Ideal: value before write
          f on C1 or value after write f on C1 (but not mixed).
	  (This follows directly from the rule.)

   - Consistency is achieved through a cache consistency protocol
   
    - Example 1: NFSv2

       datastructures:
         - attribute cache
	 - in memory buffer cache
	 - name cache  (in vnode layer)

       when opening file f, fetch inode from server (with getattr RPC)
         - if mod time on server is more recent, delete f's cached
	   blocks locally
	 - insert attributes in attribute cache (timeout of 30 seconds)

       when reading a block of f, check local buffer cache:
         - if present, use it
	 - otherwise, fetch block and store in cache
	   (replace some other block)

       after modifying a block of f (or creating a new block), write it
       through to server

       after modifying attributes of f (or creating a new file), write it
       through to server

     - Example 2: AFS

       datastructures:
         - in-memory buffer cache
	 - on-disk file cache

       when opening file f, fetch inode from server
         server:
	   - if opening for reading, add client to read list
	   - if opening for reading and there is a writer,
	     invalidate writer (writer flushes modified buffers)
	   - if opening for writing, invalidate the caches
	     of the clients on the read list
	   - server responds with file

	 on response:
	   - stick file into on-disk cache
	   
       when reading a block of f, check local buffer and disk cache:
           - if present use it
	   - otherwise, fetch it

       when writing a block of f, write it to local buffer and disk
       cache
       
       when closing file f, write f (inodes and blocks) to server (and
       keep a local copy) asynchronously


     - Cases:
		NFS			AFS
       1        might fail		ok
       2	ok			ok
       3	fails			ok
       4	??			??

- Crash consistency

  - Ideal: if system calls return, the written value is stable

  - Practice: too expensive to write everything to disk, so relax:
    - some ops are atomic so that application writer can ensure
      that data stable by using the appropriate sequence of ops.
   
  - What is local behavior after reboot
			FFS		Linux Ext2	LFS
    1. write		no		no		no
    2. close		no		no		no
    3. fsync		yes		yes		yes
    4. rename		yes		yes		no
    5. create		yes		no		no
    6. unlink		yes		no		no
    7. creat f, creat g f and g stable	anything	if g stable, f stable

    no means there is a window.  if the client stays up long enough,
    data will be stable.

  - What is ideal in network file system:
    - if system calls returns on client, the written is stable at the
      server

  - Practice: too expensive
    NFS: 
         - Goal: server reboots are not visible to client
	   - when write system call returns and client stays up, then data
           will be stable.  
	   - when write system call return and client fails, the data
           might become stable
	   - order of writes is not guaranteed
    	   - when close through unlink system calls returns, it is stable

    AFS: - server reboots are visible to client