6.824 2002 Lecture 18: Vector Timestamps

so far we've talked about enforcing some order of operations
  either implicitly because they go thru file server
  or because of e.g. Ivy DSM sequential consistency

Nice semantics, but sometimes awkward
  requires lots of chatter, w/ file server or other clients

can you sensibly operate out of your cache
  without seeing other people's updates?

example: disconnected operation with laptops
  source-code repository, cooperating programmers
  pair-wise file synchronization between programmer laptops
  want to be able to modify files while disconnected
  "optimistic replication"

Example 1:
  Just focus on a modifications to a single file.
  H1 writes "a"
  H1 -> H2
  H2 writes "b"
  H1 -> H3
  H3 -> H2
  What is the right thing to do?
  Is it enough to compare modification times?
  Yes in this case, as long as you carry them along correctly.

Example 2:
  H1 writes "a"
  H1 -> H2
  H1 writes "b"
  H2 writes "c"
  H2 -> H1
  H2's mtime will be bigger.
  Does that mean we should use "c" and discard "b"?

What is the principle here?
  We're not *enforcing* a sequential update history.
    No locks, no Ivy-style owners.
  But *if* updates were actually sequential,
    we want to detect it and carry along most recent copy.
  And if updates were not sequential,
    we want to detect the problem (after the fact).
  This is where the "optimism" comes in.

So what is the problem with using wall-clock time?
  Certainly if T1 < T2, T1 is not the latest version.
  *But* if T2 > T1, that doesn't imply that T2 supersedes T1.
  The underlying problem is that time doesn't reflect causality.
  That is, we care if the write at T2 was on a host that had seen write at T1.
  Also you cannot rely on computers having synchronized clocks.

How can we decide more precisely whether we're looking at sequential history?
  We could record each file's entire modification history.
  List of hostname/localtime pairs.
  And carry history along when synchronizing between hosts.
  For example 1:   H2: H1/T1,H2/T2   H3: H1/T1
  For example 2:   H1: H1/T1,H1/T2   H2: H1/T1,H2/T3
  Then its easy to decide if version X supersedes version Y:
    If Y's history is a prefix of X's history.
    This exactly captures our desire for an *ordered* sequence of updates.
  Note we're not comparing times, so this works w/o sync.

Why is complete history not such a great solution?
  The histories grow without bound if we modify a file a lot.

Can we compress the histories?
  Proposal: Vector Timestamps.
  Just record the latest time for each host.
  So throw away all but the last history record for each host.
  All we care about is whether there's a *new* update by some
    host that we hadn't heard about when we changed the file.
  We assume that such an update must be *after* last time we
    heard from the host: that the host isn't going backwards in time.

A vector timestamp is a vector of times, one entry per host.
  The times are (as in the histories) local wall-clock times.
    Though they could be any monotonic count.
  a = b if they agree at every element.
  a < b if a[i] <= b[i] for every i, but !(a = b).
  a > b
  a || b if a[i] < b[i], a[j] > b[j], for some i,j.
    i.e. a and b conflict
  
If one history was a prefix of the other, then one vector timestamp
  will be less than the other.

If one history is not a prefix of the other, then (at least by example)
  VTs will not be comparable.

So now we can perform file synchronization.
  When a pair of laptops has to files, they accept the one with
  the higher vector timestamp.
  And signal an error if the VTs conflict.
  Illustrate with the two examples.

What if there *are* conflicting updates?
  VTs can detect them, but then what?
  Depends on the application.
  Might be easy: mailbox file with distinct messages, just merge.
  Might be hard: changes to the same line of C source.
  Reconciliation must be done manually for the hard cases.
  CVS keeps extra information, equiv to complete history.
    So it can generate diffs from common ancestor.
    Much better than diffing the two final versions against each other.
      Which is newer? Or did they both change?

What about file deletion?
  Deletes have to have VTs so you can detect delete/write conflicts.
  Could treat as a special kind of write.
  But then you have to remember deletion timestamp indefinitely.
    In case some host out there hasn't heard about it.
  Can we garbage collect deletes?

Let's step back.
  What are vector timestamps really doing?
  They capture inter-system causality:
    What information had I heard from each other system?
  They are helpful when we care about "causal consistency":
    "Writes that are causally related must be seen by all processes
     in the same order."

How is causal consistency related to our file sync example?
  In general, each version of a file depends on the previous version.
    I.e. won't make sense to observer who didn't agree on previous version.
  Example: each update was incrementing a number in the file.
    Each version helps cause the next version.
  If updates aren't ordered, causal relationships won't make sense.
  So we've used VTs to *detect* violations of causal consistency.

We can also use VTs to help *ensure* causal consistency.
  Suppose we're all working on a s/w project, with our file sync.
  We have two files: math.c and app.c
  Alice adds sqrt() to math.c
  Bob syncs with Alice, gets new math.c
  Bob looks at math.c, changes app.c to call sqrt().
  Bob syncs with Charles
    Bob thinks "I've only changed app.c"
  Charles won't be able to compile without the new math.c
    i.e. app.c depends on math.c
  We could fix this with a second application of vector timestamps:
    Tag each file with a second "dependency" VT.
    D-VT summarizes information available at time of last update to file.
      I.e. a summary of history of inputs to file.
    Each entry in vector is MAX known time from each host,
      as of when the file was updated.
    Then sync should xfer *all* files in D-VT order.
    Thus Bob will send math.c to Charles before app.c

TreadMarks uses a similar technique...