6.824 Lecture 8: Vector Timestamps Topic: Consistency with Timestamps / Version Vectors Consistency so far Replication/caching and concurrent writers -> consistency problems Consistentency models sequential consistency --- IVY, read must see most recent write release consistency --- TreadMarks, acquire must see releaser's writes Both are "pessimistic": before you make a change, make sure you can (ask for permission) IVY: each page has at most one writer TreadMarks: put locks+release in the right places Nice semantics, but sometimes awkward Requires lots of chatter between all machines participating What if a client is disconnected? What if machines are scattered all over the world? Can you sensibly operate out of your cache without seeing other people's updates? Scenario 1: disconnected operation with laptops source-code repository, cooperating programmers each programmer's laptop has a complete copy of the files want to be able to modify files while disconnected programmers synchronize files pair-wise: "file synchronizer" how to decide which files to copy, and in which direction? only one changed: take latest version both changed: report an error, ask humans to resolve "optimistic concurrency" --- don't ask for permission; apologize later CVS, RCS, GIT, &c solves this problem Scenario 2: single user has many machines, each with a copy of the user's file system user works on one machine for a while (perhaps disconnected) user works on second machine for a while perhaps forgot to sync machine 1 with machine 2 before starting to work user syncs 2 -> 3, works on third machine back on to the first machine. want to ensure that the first machine has all the changes from 2 and 3 Example 1: Focus on a modifications to a single file H1: w1 ->H2 ->H3 H2: w2 H3: ->H2 What is the right thing to do? Is it enough to simply take file with latest modification time? Yes in this case, as long as you carry them along correctly. I.e. H3 remembers mtime assigned by H1, not mtime of sync. Example 2: H1: w1 ->H2 w2 H2: w0 ->H1 H2's mtime will be bigger. Should the file synchronizer use "0" and discard "2"? No! They were conflicting changes. We need to detect this case. Modification times are not enough by themselves What is the goal? No Lost Updates Only OK for sync to copy version x2 over version x1 if x2 includes all updates that are in x1. i.e. we desire a sequential update history But update history might not be sequential -- concurrent updates A sync would then lose one of the updates So we must forbid sync in this case and declare a conflict Why is this called an optimistic strategy? Any host can modify any file We detect concurrent updates rather than forbidding them Why not use wall-clock time of last modification? Suppose T(x1) < T(x2). Certainly x1 does not superseded x2. But does x2 superseded x1? We cannot tell: maybe both x1 and x2 derive from a common x0 but x1 and x2 changes are independent, on different hosts and x1 happened before x2 x1 precedes x2 does *not* mean x2 incorporates x1. How can we decide whether we're looking at a sequential history? We could record each file's entire modification history. List of hostname/localtime pairs. And carry history along when synchronizing between hosts. For example 1: H2: H1/T1,H2/T2 H3: H1/T1 For example 2: H1: H1/T1,H1/T2 H2: H1/T1,H2/T3 Then its easy to decide if version X supersedes version Y: If Y's history is a prefix of X's history. This exactly captures our desire for an *ordered* sequence of updates. Note we're not comparing times, so this works w/ out-of-sync host clocks. Why is complete history not such a great solution? The histories grow without bound if we modify a file a lot. Can we compress the histories? Note that if a version history contains H1/T5, the file must also reflect all updates H1/T1..4. Since only H1 can produce H1/Tx history entries, and it does so by modifying the local file copy. And H1 won't replace its copy with a version that has a smaller version history. So we can omit all but the last entry for a given host in the version history. So we only need one timestamp per host. Proposal: Vector Timestamps. A VT says "this file contains all updates from each host up to the indicated time." A vector of times, one entry per host. The times are (as in the histories) local wall-clock times. Though they could be any monotonic count: local version number. If a and b are vector time-stamps: a = b if they agree at every element. a <= b if a[i] <= b[i] for every i. a >= b a || b if a[i] < b[i], a[j] > b[j], for some i,j. i.e. a and b conflict Examples: <1, 3, 2> <= <1, 2, 2> (H2 made a change) <1, 3, 2> || <2, 3, 1> (H3 and H1 made concurrent changes) [diagram of boundary implied by VT] So now we can perform file synchronization. Laptops carry along a version vector with each file, rather than just the single timestamp of last change. When a pair of laptops syncs a file, they accept the file with the higher vector timestamp. And signal an error if the VTs conflict. Illustrate with the two examples. Ex1. H1: <1,0,0> H3: <1,0,0> H2: <1,1,0> H3: take update from H2 Ex2. H1: <1,0,0> H2: <1,0,0> H1: <2,0,0> H2: <1,1,0> conflict w. H1 What if there *are* conflicting updates? VTs can detect them, but then what? Depends on the application. Easy: mailbox file with distinct messages, just merge. Medium: changes to different lines of a C source file. Hard: changes to the same line of C source. Reconciliation must be done manually for the hard cases. CVS keeps extra information, equiv to complete history. So it can generate diffs from common ancestor. Much better than diffing the two final versions against each other. Which is newer? Or did they both change? What about file deletion? Can H1 just forget a file's VT if it deletes the file? No: when H1 syncs w/ H2, it will look like H2 has a new file. H1 must remember all deleted files and their VTs. Treat delete like a file modification. H1: w1 ->H2 H2: del ->H1 second sync sees H1:<1,0> H2<1,1>, so delete wins at H1 There can be delete/write conflicts H1: w1 ->H2 w2 H2: del ->H1 H1:<2,0> vs H2:<1,1> -- conflict A real system needs to GC the delete records. Once all hosts have deleted the file. A slow/disconnected host can hold up the GC. Step back: what do file syncronizers use VTs for? -> does one version include all info from another version? a way to track dependency: what inputs did this data derive from? or causality: what events might have caused this event to occur? What does a VT *mean*? marks a point in time on each host <3,...> implies inclusion of all updates up to 3 from H1 Remember TreadMarks uses VTs -- for what? to help ensure consistency CPU0: a1 x=1 r1 CPU1: a1 y=x r1 CPU2: a1 print x, y r1 CPU2 acquires lock from CPU1 CPU1 asks CPU1 "what did you know as of release?" CPU1's answer is a VT, e.g. <1,2,0> VT indicates CPU1 know about CPU0's write to x CPU2 knows to ask CPU0 for that write ------- Optimistic Replication Using Vector Time Pairs Russ Cox and William Josephson what's the overall goal? flexible file sync what are the problems w/ existing file synchronizers? must exchange VT for every file during sync must store VT for every file must store VT for every deleted file (note VTs can be big if many hosts involved) what's the key idea? two vector times: mod time (mt) and sync time (st) what you have vs what you know sync time each file has a sync time when H1 syncs to H2 sync times of files set of elementwise max sync time example for one file 1 2 3 4 H1: x1 ->H2 ->H3 H2: ->H3 H3: when H2 syncs to H3: file mt at H2 is <1,0,0> file st at H3 is <3,0,0> thus H3 knows it has synced since last file change so far, not a win over single VT per file suppose we have a tree of many files and directories common case: almost all files are up to date syncs usually only xfer a few files how to detect "no sync required" w/o exchanging all files' VTs? if all H2's files have been synced since H1's most recent change, don't need to copy anything from H1 to H2 so H2 needs min(st), H1 needs max(mt) Tra uses this observation for each directory dir mt = elementwise max over all descendents' mt dir st = elementwise max over all descendents' st Example: H1 and H2 sync at time 5 all files' st set to <5,5> so dir st = <5,5> H1 and H2 sync again at time 6 no recent changes, so dir mt < <5,5> so quick compare shows dir mt < dir st one file changed on H1 so dir mt=<7,7>, dir st = <6,6> sync at time 8 H1 mt=<7,7> > H2 st=<6,6> so they must compare per-file VTs does dir sync/mod sound like an important optimization? yes, for big directories millions of files -> vast VT traffic Tra can reduce that to a just two VTs I have this problem! why couldn't we do this w/ single VT? e.g. compare H1's max(file VTs) vs H2's min(file VTs) max(file VTs) is a good conservative estimate of latest update but min(file VTs) is likely to be <0,0,...> it has to be elementwise (min not generally defined on VTs) and there will usually be some file that wasn't modified on each host result: it will always seem like there is new information how does Tra handle deletion? does not have delete notices; host just deletes the file and its VTs dir sync time proves no file existed as of that time!!! during sync if other file mt < dir st, he should delete if other file mt > dir st? delete/write conflict? or new file? look at file creation time vs st to decide