Optimistic Replication Using Vector Time Pairs Russ Cox and William Josephson what's the overall goal? flexible file sync what are the implied problems w/ existing file synchronizers? they need to garbage collect file deletion notices they can't sync sub-trees must look at every file in order to sync VT per file requires a lot of storage can be held up by dead replicas what are the claimed novel properties? simplicity, flexibility easier to compress (section 3.2) in particular, singleton mod times! no explicit adding or removing of hosts can quickly find data that needs to be transferred (section 3.1) partial sync is easy how did previous systems work? one vector timestamp per file, reflects modifications synchronize a disk: compare every file's VT need to transfer every VT, and store them need to keep VT for deleted file, treat deletion as modification otherwise it will re-appear on next sync GC deleted file's VT after everyone has seen it so maintain, for every other node, highest VT it is known to know really min VT for which it knows every VT before that gossip. one slow node can prevent GC. what's the key technique? two vector times: mod time and sync time what you have vs what you know what does this mean? Intuition? sync time implies times at which modifications *didn't* occur? in what sense does a single VT combine mod and sync times? any advantages for a single file? maybe not advantages for syncing a whole directory tree? dir sync time = elementwise min of child sync times dir mod time = elementwise max of child mod times can skip whole dir if src dir's mod time <= dst dir's sync time why does this make sense? higher sync time proves you had (maybe indirectly) synced w/ relevant node, he was required to tell you if he changed anything. does dir sync/mod sound like an important optimization? why couldn't we do this w/ single VT? skip if src dir's mod time <= min(dst tree's mod times)? skip if src dir's mod time <= max(dst tree's mod times)? what if I sync just a sub-tree will that mess up optimized syncs of parent? won't mess up sync time, since min only changes parent sync time if this subdir was laggard hmm, may increase mod time. but that's conservative, will cause syncer to consider this directory why can they get away with singleton file mod times? i.e. suppose we have A:1, B sees it, then B:2 how does A know B:2 supersedes A:1? B's sync time tells it! so haven't we just moved the complexity to the sync time? no: need all elements of sync VT, but all files usually have same sync time how do they garbage collect deletion notices? dir sync time proves no file existed as of that time!!! so if during sync other guy sends earlier create time, he should delete. and if he has later create time, you must create. if his file mod time > your dir sync time, conflict you can tell because create < dirsync and you don't have the file... it's a disk file system: how do they detect user's changes?