"Wide-area cooperative with CFS" 
    --- Dabek, Kaashoek, Karger, Morris, Stoica


Let's use the efficient location primitives (Chord, CAN, PAST) to
build a file system to distribute a large, popular software distribution. 

Assume this API for the lookup primitives:

IP_adddress lookup (key k);

where key is a large integer

What are we trying to accomplish?
--------------------------------

- load balance
  
	Q: Why distribute? Why not just buy a single machine? 
	A: Single machine must have the resources (bw/cpu) to serve peak
load. As a result, average utilization will be low. Most investment wasted.

	Q: What can we do to improve the utilization of this machines resources?
	A: Serve multiple content sources w/ independent load (in time)
	   + each source creator pays for the resources to serve fractional load
	   + problem.... content creators must have administrative connections

	We want: many machines, each serving some fraction of the _average_ load.

	also: what about balancing space? between hosts w/ different capacities?

- performance

	Same as TCP to (median server? mean?)

- reliability

	- Tolerate node failures (at what rate?)
	  - lookup failures (handled by primitive)
	  - data loss 

	--> arrange for 'good enough' reliability at reasonable cost
	- what is enough: at least as good as single server? better?
	   + power outage at LCS gives server: 99 percent uptime
	   + 'five nines (99.999%)' -> five minutes down per year


- anonymity/authentication?

	- anonymity ignored by CFS
	- conflicts with performance / reliability goals of primitives

- indexing/keyword search?

	- different class of application

==> Load balance is main force in designing this system


How will we achieve these goals?
-------------------------------

- Need a different API
       - CFS provides DHASH layer:
       
       void insert (key k, void *data);
       void *data = lookup (key k);

       - really a layer? Example of why it isn't a layer: successor placement, lookup really performed in DHASH
       --> as we design CFS we are really building DHASH: CFS is really all about DHASH. Chord's main job is maintaining finger tables.

       - basic DHASH operation
         * to insert data d under key k
	 1. get IP = lookup (k);
	 2. contact IP, send d,k
	 * to fetch
	 1. get IP = lookup (k);
	 2. contact IP, send k, get d

      - we will use this interface and extend it to meet our design goals

- load balance:

       - striping
	 + employed by CFS/Mojonation. Large files distributed across large
	   number of nodes
	 + imposes cost of one lookup per block (i.e. many lookups per file)
	 + insert is: for (all blocks) insert (NAME (block[i]), block[i]);
	 + how to track stripes? akin to forming files from disk sectors
	       --- SFSRO file system layout: merkle tree

	 + can add block caching for small files
	   - how to modify DHASH to do this?
	     --> along path, DHASH must do lookups/be informed of path to cache files
		 - recursive v. iterative operation?


       - (whole file) caching
	 + employed by PAST/Freenet/Napster
         + distributes files around the network as they are requested
	 + only one lookup per file
	 + insert is: insert (NAME (file), file);


       ---> now we just need to get DHASH right (add reliability) and we have a file system
	 
       --> which is better (whole file/block oriented)? 

	     - striping imposes more lookups (requires prefetch) uses
               less disk space to achieve same level of load balance

	     - whole file caching reduces lookup overhead

	- dealing w/ heterogeneous node capacities:
		  - virtual servers (CFS)
		  - redirection (PASTRY)

- reliability

	- need replication if we will tolerate failures
	       - where to place replicas?
		 - need: replica failure independent

		 - successors (chord specific)
		 - rehashing:  replica 2 = SHA(SHA(key));

	- how to modify DHASH to provide reliablity?
		 - needs to place replicas
		 - needs to recogonize failure and know how to recover
		   --> must be invovled in the lookup (DHASH knows where replicas are located)
		   --> Chord only provides routing table information to DHASH which does the lookups 

	       - how does striping affect reliability?
		   - all blocks of a file must be found to reconstruct file

		   p (finding a block) = 1 - p (servers are down)  [assume no lookup failures]
				       = 1 - l^r  where l is prob a server is down, r is replication degree
		   p (finding a file) = p (finding a block)^B    where B is the # of blocks
				      = (1 - l^r)^B
				      approx = 1 - B*(l^r)
				      ==> easy to drive this to one by increasing r

- performance

  
	- how does performance depend on load balance scheme?

		  - whole file: speed will be equal to that of TCP between client/server

		  - striping: problem: lookup latency leads to poor performance
			      w/ prefetch, lookup latency hidden by block fetch
			      - interaction w/ network protocol?

	- what about server selection? Choose 'close' servers to find blocks (see replication below)
	    - explain chord server selection? 

- what about authentication/naming?

       - possible primitives for authentication:
		  - PK
		  - MAC
		  - content hash

       - CFS uses SFSRO style auth: merkle tree
	 [picture of SFSRO auth tree]
	   - nit: how to handle ".." (show circular dependency)

- anonymity?
	
	- open question
	- is system useful w/o anonymity?