6.824 Lecture 8: Vector Timestamps

Topic: Consistency with Timestamps / Version Vectors

Consistency so far
  Replication/caching and concurrent writers -> consistency problems
  Consistentency models
    sequential consistency --- IVY, read must see most recent write
    release consistency --- TreadMarks, acquire must see releaser's writes
  Both are "pessimistic":
    before you make a change, make sure you can (ask for permission)
    IVY: each page has at most one writer
    TreadMarks: put locks+release in the  right places

Nice semantics, but sometimes awkward
  Requires lots of chatter between all machines participating
  What if a client is disconnected?
  What if machines are scattered all over the world?

Can you sensibly operate out of your cache
  without seeing other people's updates?

Scenario 1: disconnected operation with laptops
  source-code repository, cooperating programmers
  each programmer's laptop has a complete copy of the files
    want to be able to modify files while disconnected
  programmers synchronize files pair-wise: "file synchronizer"
  how to decide which files to copy, and in which direction?
    only one changed: take latest version
    both changed: report an error, ask humans to resolve
  "optimistic concurrency"  --- don't ask for permission; apologize later
  CVS, RCS, GIT, &c solves this problem

Scenario 2:
  single user has many machines, each with a copy of the user's file system
  user works on one machine for a while (perhaps disconnected)
  user works on second machine for a while
     perhaps forgot to sync machine 1 with machine 2 before starting to work
  user syncs 2 -> 3, works on third machine
  back on to the first machine.
    want to ensure that the first machine has all the changes from 2 and 3

Example 1:
  Focus on a modifications to a single file
  H1: w1 ->H2       ->H3
  H2:            w2
  H3:                       ->H2
  What is the right thing to do?
  Is it enough to simply take file with latest modification time?
  Yes in this case, as long as you carry them along correctly.
    I.e. H3 remembers mtime assigned by H1, not mtime of sync.

Example 2:
  H1: w1 ->H2 w2
  H2:                  w0 ->H1
  H2's mtime will be bigger.
  Should the file synchronizer use "0" and discard "2"?
    No! They were conflicting changes. We need to detect this case.
    Modification times are not enough by themselves

What is the goal?
  No Lost Updates
    Only OK for sync to copy version x2 over version x1 if
    x2 includes all updates that are in x1.
    i.e. we desire a sequential update history
  But update history might not be sequential -- concurrent updates
    A sync would then lose one of the updates
    So we must forbid sync in this case
    and declare a conflict

Why is this called an optimistic strategy?
  Any host can modify any file
  We detect concurrent updates rather than forbidding them

Why not use wall-clock time of last modification?
  Suppose T(x1) < T(x2).
  Certainly x1 does not superseded x2.
  But does x2 superseded x1?
  We cannot tell:
    maybe both x1 and x2 derive from a common x0
    but x1 and x2 changes are independent, on different hosts
    and x1 happened before x2
  x1 precedes x2 does *not* mean x2 incorporates x1.

How can we decide whether we're looking at a sequential history?
  We could record each file's entire modification history.
  List of hostname/localtime pairs.
  And carry history along when synchronizing between hosts.
  For example 1:   H2: H1/T1,H2/T2   H3: H1/T1
  For example 2:   H1: H1/T1,H1/T2   H2: H1/T1,H2/T3
  Then its easy to decide if version X supersedes version Y:
    If Y's history is a prefix of X's history.
    This exactly captures our desire for an *ordered* sequence of updates.
  Note we're not comparing times, so this works w/ out-of-sync host clocks.

Why is complete history not such a great solution?
  The histories grow without bound if we modify a file a lot.

Can we compress the histories?
  Note that if a version history contains H1/T5, the file must
    also reflect all updates H1/T1..4.
    Since only H1 can produce H1/Tx history entries, and it
      does so by modifying the local file copy.
    And H1 won't replace its copy with a version that has
      a smaller version history.
  So we can omit all but the last entry for a given host in
    the version history.
  So we only need one timestamp per host.

Proposal: Vector Timestamps.
  A VT says "this file contains all updates from each
    host up to the indicated time."
  A vector of times, one entry per host.
  The times are (as in the histories) local wall-clock times.
    Though they could be any monotonic count: local version number.
  If a and b are vector time-stamps:
  a = b if they agree at every element.
  a <= b if a[i] <= b[i] for every i.
  a >= b
  a || b if a[i] < b[i], a[j] > b[j], for some i,j.
    i.e. a and b conflict

Examples:
  <1, 3, 2> <= <1, 2, 2>
    (H2 made a change)
  <1, 3, 2> || <2, 3, 1>
    (H3 and H1 made concurrent changes)

[diagram of boundary implied by VT]

So now we can perform file synchronization.
  Laptops carry along a version vector with each file, rather
    than just the single timestamp of last change.
  When a pair of laptops syncs a file, they accept the file with
    the higher vector timestamp.
  And signal an error if the VTs conflict.
  Illustrate with the two examples.
    Ex1. H1: <1,0,0>  H3: <1,0,0> H2: <1,1,0>  H3: take update from H2
    Ex2. H1: <1,0,0> H2: <1,0,0>  H1: <2,0,0> H2: <1,1,0>  conflict w. H1

What if there *are* conflicting updates?
  VTs can detect them, but then what?
  Depends on the application.
  Easy: mailbox file with distinct messages, just merge.
  Medium: changes to different lines of a C source file.
  Hard: changes to the same line of C source.
  Reconciliation must be done manually for the hard cases.
  CVS keeps extra information, equiv to complete history.
    So it can generate diffs from common ancestor.
    Much better than diffing the two final versions against each other.
      Which is newer? Or did they both change?

What about file deletion?
  Can H1 just forget a file's VT if it deletes the file?
    No: when H1 syncs w/ H2, it will look like H2 has a new file.
  H1 must remember all deleted files and their VTs.
  Treat delete like a file modification.
    H1: w1 ->H2 
    H2:         del ->H1
    second sync sees H1:<1,0> H2<1,1>, so delete wins at H1
  There can be delete/write conflicts
    H1: w1 ->H2 w2
    H2:         del ->H1
    H1:<2,0> vs H2:<1,1> -- conflict
  A real system needs to GC the delete records.
    Once all hosts have deleted the file.
    A slow/disconnected host can hold up the GC.

Step back: what do file syncronizers use VTs for?
  -> does one version include all info from another version?
  a way to track dependency: what inputs did this data derive from?
  or causality: what events might have caused this event to occur?

What does a VT *mean*?
  marks a point in time on each host
  <3,...> implies inclusion of all updates up to 3 from H1

Remember TreadMarks uses VTs -- for what?
  to help ensure consistency
  CPU0: a1 x=1 r1
  CPU1:             a1 y=x r1
  CPU2:                       a1 print x, y r1
  CPU2 acquires lock from CPU1
  CPU1 asks CPU1 "what did you know as of release?"
  CPU1's answer is a VT, e.g. <1,2,0>
  VT indicates CPU1 know about CPU0's write to x
  CPU2 knows to ask CPU0 for that write

-------

Optimistic Replication Using Vector Time Pairs
Russ Cox and William Josephson

what's the overall goal?
  flexible file sync

what are the problems w/ existing file synchronizers?
  must exchange VT for every file during sync
  must store VT for every file
  must store VT for every deleted file
  (note VTs can be big if many hosts involved)

what's the key idea?
  two vector times: mod time (mt) and sync time (st)
  what you have vs what you know

sync time
  each file has a sync time
  when H1 syncs to H2
    sync times of files set of elementwise max

sync time example for one file
      1    2    3    4
  H1: x1 ->H2 ->H3
  H2:              ->H3
  H3:
  when H2 syncs to H3:
    file mt at H2 is <1,0,0>
    file st at H3 is <3,0,0>
  thus H3 knows it has synced since last file change
  so far, not a win over single VT per file

suppose we have a tree of many files and directories
  common case: almost all files are up to date
    syncs usually only xfer a few files
  how to detect "no sync required" w/o exchanging all files' VTs?
  if all H2's files have been synced since H1's most recent change,
    don't need to copy anything from H1 to H2
  so H2 needs min(st), H1 needs max(mt)

Tra uses this observation for each directory
  dir mt = elementwise max over all descendents' mt
  dir st = elementwise max over all descendents' st

Example:
  H1 and H2 sync at time 5
    all files' st set to <5,5>
    so dir st = <5,5>
  H1 and H2 sync again at time 6
    no recent changes, so dir mt < <5,5>
    so quick compare shows dir mt < dir st
  one file changed on H1
    so dir mt=<7,7>, dir st = <6,6>
  sync at time 8
    H1 mt=<7,7> > H2 st=<6,6>
    so they must compare per-file VTs

does dir sync/mod sound like an important optimization?
  yes, for big directories
  millions of files -> vast VT traffic
  Tra can reduce that to a just two VTs
  I have this problem!

why couldn't we do this w/ single VT?
  e.g. compare H1's max(file VTs) vs H2's min(file VTs)
  max(file VTs) is a good conservative estimate of latest update
  but min(file VTs) is likely to be <0,0,...>
    it has to be elementwise (min not generally defined on VTs)
    and there will usually be some file that wasn't modified on each host
  result: it will always seem like there is new information

how does Tra handle deletion?
  does not have delete notices; host just deletes the file and its VTs
  dir sync time proves no file existed as of that time!!!
  during sync
    if other file mt < dir st, he should delete
    if other file mt > dir st?
      delete/write conflict?
      or new file?
      look at file creation time vs st to decide