6.5840 2025 Lecture 4: Consistency, Linearizability today's topic: consistency models, specifically linearizability it's super-common to have storage as a distinct service so that computation and storage are separate, talk via RPC [simple diagram] e.g. web site application logic vs database e.g. MapReduce vs GFS we need to be able to reason about correct behavior for distributed storage e.g. what application programmers can expect from GFS or Lab 2 one part is what the individual requests should do today: how concurrent clients should interact -->> consistency model what's a consistency model? a specification for the relationship of different clients' views of a service I'll focus on key/value storage with network clients put(k, v) -> get(k) -> v given some put/get calls, what outcome(s) are valid? in ordinary sequential programming, there's nothing to talk about: we expect a read to yield the last written value when might there be any question about what's correct? [simple client/server diagrams] read concurrent with write replicas caches failure, recovery lost messages retransmission why does a storage system need an explicit consistency model? for applications, hard to be correct w/o guarantees from storage e.g. producer computes, then executes put("result", 27) put("done", true) consumer executes while get("done") == false: pause v = get("result") is v guaranteed to be 27? for services, hard to design/implement/optimize w/o a specification e.g. is it OK for clients to read from GFS replicas (rather than primary)? there are lots of consistency models sometimes driven by desire to simplify application programmers' lives sometimes driven by desire for storage performance sometimes describing behavior that was convenient for implementors lots of overlapping definitions from different fields e.g. file-systems, databases, CPU memory today: linearizability but we'll also see: eventual consistency causal consistency fork consistency serializability driving force: performance / convenience / fault-tolerance tradeoffs linearizability it's a specification -- a requirement for how a service must behave from clients' point of view: from outside the service it's usually what people mean by "strong consistency". linearizability matches programmer intuitions reasonably well. but rules out many optimizations. you'll implement a linearizable key/value store in Lab 2. and again in Lab 4, this time with fault tolerance. starting point we assume that there's a serial spec for what individual operations do serial = a single server executing operations one at a time db[] put(k, v): db[k] = v return true get(k): return db[k] no surprises here. what about concurrent client operations? a client sends a request; takes some time crossing the network; server computes, talks to replicas, &c; reply moves through network; client receives reply other clients may send/receive/wait during that time! so the serial spec can't directly be applied we need a way to describe concurrent scenarios, so we can talk about which results are/aren't allowed definition: a history describes a time-line of possibly-concurrent operations each operation has client start and finish times (time client sent RPC request, and received reply) as well as argument and return values example: C1: |-Wx1-| |-Wx2-| C2: |---Rx2---| the x-axis is real time |- indicates the time at which client sent request -| indicates the time at which the client received the reply "Wx1" means put(x, 1) "Rx2" means get(x) -> 2 C1 sent put(x, 1), got reply, sent put(x, 2), got reply writes have responses, signifying completion C2 sent get(x), got reply=2 a history is a trace of what clients saw in an actual execution used to check whether the execution was linearizable used by designers in "would this be OK" thought experiments definition: a history is linearizable if * you can find a point in time for each operation between its start and finish, and * the history's result values are the same as serial execution in point order. example history 1: |--Wx1--| |--Wx2--| |----Rx2----| |--Rx1--| is this history linearizable? can we find a linearization point for each operation? we may need to try a few different point assignments. this order of points satisfies the rules: Wx1 Rx1 Wx2 Rx2 1. each point lies between start and finish. 2. the sequence satisfies the serial put/get spec. note: either read could have returned either 1 or 2. so linearizability often allows multiple different outcomes. so we often can't predict in advance, but we can check afterwards. note: the service probably didn't execute the operations at those points! we're not concerned here with how the service operated internally we only care that the client-visible results could have resulted from execution in some point order what can we do with the linearizability definition? for designer: could this optimization result in non-linearizable results? for programmer: what can I assume / expect as a client? for testing: generate requests, check observed history. why is it called "linearizability"? the linearization points turn concurrent operations into a serial execution -- "linear". thus "linearizable" in the sense that the results are the same as some linear execution of the operations. example 2: |-Wx1-| |----Wx2----| |---Rx2---| |-Rx1-| we can try a few assignments of linearization points. how about Wx1 Wx2 Rx2 Rx1? not valid because "Wx2 Rx1" doesn't conform to serial spec. how to show something *isn't* linearizable? show that no assignment of points works. i.e. breaks either time rule or value rule. often you can take shortcuts to eliminate lots of assignments e.g. time says that either Wx1 or Rx2 must come first no assignment works for example 2! Wx2's point must be before Rx2's point so Wx2's point is also before Rx1's point so the second read got an impossible value thus, if a system could produce this history, we would know that the system wasn't linearizable: has a bug, or never promised linearizability. the Rx1 *would* have been legal if there had been no Rx2 reads (as well as writes) can affect what's legal in future so, if we want linearizability: once any read sees a write, all strictly-subsequent reads must also see it. rules out split-brain can't forget a revealed write rules out e.g. forgetting data due to a crash GFS is not linearizable: it can produce the example 2 history since the Rx1 could come from a replica that hasn't yet been updated. if we wanted GFS to be linearizable, one approach is to have client reads go through the primary too. would be slower! example 3: |--Wx0--| |--Wx1--| |--Wx2--| |-Rx2-| |-Rx1-| this may look non-linearizable because it might seem that the Rx2 should force the 2nd read to see 2 as well. but this order shows it's linearizable: Wx0 Wx2 Rx2 Wx1 Rx1 so: the service can pick either order for concurrent writes. the linear order can be different from start-time or end-time order! example 4: |--Wx0--| |--Wx1--| |--Wx2--| C1: |-Rx2-| |-Rx1-| C2: |-Rx1-| |-Rx2-| can there be a serial order? C1 needs Wx2 Rx2 Wx1 Rx1 C2 needs Wx1 Rx1 Wx2 Rx2 we can't have both Wx2 before Wx1, and Wx2 after Wx1. so not linearizable. so: service can choose either order for concurrent writes but all clients must see the writes in the same order this is important when there are replicas or caches they all have to execute operations in the same order example 5: |-Wx1-| |-Wx2-| |-Rx1-| no order is possible -- not linearizable so: reads must return fresh data: linearizability rules out stale reads even if the reader doesn't know about the write the time rule requires reads to yield the latest data again, affects use of caching and replication linearizability outlaws many attractive design possibilities / mistakes: split brain (two active leaders) forgetting completed writes after a crash+reboot reading from lagging replicas or out-of-date caches example 6: [client / network / server diagram] C1 sends put(x, 1) C2 sends put(x, 2) service receives C1's request; network drops response; C1's RPC library re-sends request is it legal for service to execute *both* of C1's request messages? we then might see this if C3 reads three times: C1: |--------Wx1---------| (due to retransmission) C2: |-Wx2-| C3: |-Rx1-| |-Rx2-| |-Rx1-| assume x starts out as zero this history is not linearizable! so, if we want linearizability: duplicate requests from retransmissions must be suppressed! Lab 2... linearizable systems are not limited to just read and write operations increment append test-and-set (to implement locks) any operation on server state application programmers like linearizability -- it's relatively easy to use: * reads see fresh data -- not stale * all clients see the same data (when there aren't writes) * all clients see data changes in the same order so my put(v,27); put(done,true); example works these benefits will be clearer when we look at weaker consistencies. how can we implement linearizability? depends on how much replication, caching, and fault-tolerance we want. single serial server that doesn't crash. [diagram: clients, server, op queue, state] server picks an order for concurrently arriving client requests. executes them in that order, one at a time, replies to each before starting the next. plus duplicate request suppression note: server does not have to reason about histories, linearization points, or concurrency. can we have stronger consistency than linearizability? how about get sees most recently *completed* put? so here we'd guarantee Rx2, never Rx1: C1: |---Wx1---| C2: |---Wx2---| C3: |--Rx2--| and here we'd guarantee Rx1, never Rx2: C1: |---Wx1---| C2: |---Wx2---| C3: |--Rx1--| such a guarantee would be difficult: server cannot easily know when operations complete at clients linearizability is nice for server b/c it allows freedom in ordering of concurrent operations. what if we want high availability? primary/backup replication [diagram: primary, two backups] all requests go to the primary picks a serial order forwards to backups backups execute in the same order primary replies to client only after both backups have executed so, if client saw response, all backups guaranteed to have executed important if primary fails to avoid forgetting completed requests clients cannot send reads directly to a backup, as in GFS C1 might see new value, subsequently C1 might see old value need an external party to decide when backup should take over to avoid split brain e.g. atomic test-and-set in shared disk in VMware FT, or GFS coordinator what about the performance of linearizable systems? bad news: serial aspect may make it hard to get parallel speedup bad news: if replication, then lots of communication and waiting bad news: if replication, replicas must be reachable, limiting fault tolerance good news: you can shard by key what about other consistency models? can they allow better performance? do they have intuitive semantics? example: eventual consistency -- a weak model multiple copies of the data (e.g. in different datacenters, for speed) a read consults any replica (e.g. closest) a write updates any replica (e.g. closest) replicclient a sends response when that one update is done replicas synchronize updates in the background eventually, other replicas will see my update eventual consistency is pretty popular faster than linearizability especially if replicas are in different cities for fault-tolerance and more available -- any one replica will do no waiting for primary/backup communication Amazon's Dynamo; Cassandra; GFS but eventual consistency exposes some anomalies to application programmer: * a read may not see the most recent write -- reads can see stale data a problem for password change, ACL change * writes may appear out of order breaks my result/done example * different clients may see different data * concurrent writes to same item need to be resolved somehow! C1: put(x, 1) C2: put(x, 2) may initially be applied at different replicas only later will they be pushed to other replicas how to merge concurrent new values? how to ensure all replicas choose the same final value? so that, eventually, they are identical? * eventual consistency cannot support e.g. test-and-set A general pattern: you can usually choose only one of these: Strong consistency Maximum availability But not both. Strong consistency makes you wait to update replicas, and can't proceed if too many replicas are unavailable. Thus poor availability. Eventual consistency can proceed even if no other replicas are reachable. But has poor consistency. --- https://jepsen.io/consistency/models