TreadMarks: Distributed Shared Memory on Standard Workstations and Operating Systems
Keleher, Cox, Dwarkadas, Zwaenepoel
USENIX 1994 Winter

High level goals?
  Better DSM performance.
  Run existing sequential consistent parallel code.

What is lazy release consistency?

Why do they need vector timestamps?

Example for why you need the vector timestamps.
Assume each variable on its own page.
  Or that granularity is a single word...
CPU0: al1 x=1 rl1
CPU1:             al1 y=x rl1
CPU2:                         al1 print x, y rl1
How would an eager release consistent system handle this?
How does lazy release consistency handle this?
How does TreadMarks know what to do?

Example of why LRC might speed up your program.
CPU0: al1 x=1 rl1  al2 z=99 rl2
CPU1:              al1 y=x rl1
CPU2:                            al1 print x, y, z rl1
How would an eager release consistent system handle this?
What does Treadmarks do?
  And why is that faster?
Why is it legal for z to be out-of-date at CPU2?
How does Treadmarks *know* it is legal?

What if you get different values from different sources?
CPU0: al1 x=1 rl1  al2 y=9 rl2
CPU1:              al1 x=2 rl1
CPU2:                           al1 al2 z = x + y rl2 rl1
CPU2 is going to hear "x=1" from CPU0, and "x=2" from CPU1.
  How does CPU2 know what to do?

What if the VTs for the two values are not ordered?
  Could this happen?
CPU0: al1 x=1 rl1
CPU1:              al2 x=2 rl2
CPU2:                           al1 al2 print x rl2 rl1

What model of consistency does the programmer need to have in mind?
  What rules does the programmer have to follow?
  What can the programmer expect from the memory system?

Example of when LRC might still do too much work.
CPU0: al2 z=99 rl2  al1 x=1 rl1
CPU1:                            al1 y=x rl1
In this case, CPU1 didn't really need z.
This suggests that further improvements might be possible,
  at what expense?
  Compiler or programmer notices dependencies?

What happens in this case?
CPU0: al1 x=1 rl1                      al1 x=2 rl1
CPU1:              al1 al2 y=x rl2 rl1
CPU2:                                              al2 z=y rl2
Does CPU2 get the x=2 update? Should it? Does it make any difference?
  In most cases TreadMarks could avoid sending x=2 if it wanted to.
  But there might have been a GC, forcing CPU2 to know about x=2.

Are there programs that work under RC that break under LRC?
  RC is blind to which lock variable was involved.
CPU0: al1 x=1 rl1  al1 if(y==0) ... rl1
CPU1: al2 y=1 rl2  al2 if(x==0) ... rl2

TreadMarks keeps complete history of write notices.
  I.e. accurate record of what changed in each interval.
  Write notice and interval record lists in Figure 2.
  It does not just keep the latest data.
  Why? Is the history needed for LRC?

What's the point of the diffs?

Why do they send write notices separately from the diffs?
  They can delay the diffs until you actually use the page: you might not.
  Maybe you already have the write notice? But VT indicates this?

When you ask for diffs, how far back in the past do you ask for them?
  Does this imply unbounded amount of history?

Is LRC actually a good idea?