Q: Does Linux use scalable locks?

Yes. Linux uses MCS locks in its mutex-lock implementation and
qspinlocks (which also uses MCS underneath) to implement spinlocks.
See discussion below for more details.

Q: Does performance collapse occur in real life?

A: Yes. Scalability bottlenecks have been demonstrated in real-life
deployments of Linux, and the Linux community has worked hard for
years and fixed many of them.

Q: Given that ticket and spin locks can hurt performance so much, why
does anyone use them at all?

A: If a particular lock rarely experiences contention, it won't hurt
performance much, and (under low contention) ticket locks are faster
than scalable locks and have a simpler API. If a lock is contended,
that typically signals that there is a deeper problem with the kernel
subsystem that uses the lock. Typically it means that the subsystem
has a scalability problem, and kernel developers should redesign the
subsystem to allow more parallelism.

Q: Why are they called "scalable" locks?

A: They are scalable in the sense that the number of cache coherence
protocol messages required to do a single release of the lock stays
constant even as the number of contending cores grow. In contrast, the
cost of a single release of a spinlock or ticket lock grows linearly
with the number of contending cores. "Scalable" does not, however,
imply overall high performance, since a scalable lock still enforces
serial execution of the critical section. A better name would have
been "non-collapsing" locks.

Q: At what level is cache-coherence implemented?

A: Processors implement cache coherence in hardware. Each cache has a
cache controller implemented in hardware, which sends and receives
cache-coherence messages over the interconnect that connects the cache
controllers. Cache-coherence is transparent to software.

Q: Multi-processor computers have been around for 50 years. Why were
scalable locks still a point of debate as recently as 2012, when this
paper was published?

Until about 10 years ago parallelism in mainstream kernels wasn't very
important, because most machines had only one processor. Around 2001
it became impossible to make uniprocessors with faster clock rates,
but transistors per chip kept growing, so manufacturers started
putting multiple cores into most CPU chips. At that point, parallelism
became important for most computers. Initially these machines had a
small number of cores so coarse-grain parallelism was sufficient but
today many machines have a dozen or several dozens of cores. Now
fine-grained parallelism is important.

Q: How do MCS locks work?

The key idea is that each core spins on its own cache-line so that a
lock release takes O(1) time because only one core is spinning on the
line that is modified by the releasing core. You can find MCS code
here (towards the end of the file):

https://pdos.csail.mit.edu/6.828/2016/lec/scalable-lock-code.c

Q: Are there properties of locks other than scalability that are
important?

A: Other metrics of importance are: memory use (scalable locks
typically use more memory than spinlocks), cost of acquiring/releasing
when there is no contention (scalable locks are a bit more expensive),
and the simplicity of the API (scalable locks have a more complicated
API).

Q: Why does the paper say that it's hard to change the Linux directory
cache code to use MCS locks?

A: The main reason is that the APIs for ticket locks and MCS locks are
different. The MCS acquire and release take an extra argument: a Q
node (a small struct) on which the acquires wait. To avoid the expense
of allocating the Qnode with malloc/free, the MCS modifications to the
directory cache try to allocate the qnode on the stack, which is more
efficient. This makes the directory cache modifications a bit tricky.
For example, in the directory cache code the acquire and the release
call may not be in the same function. The modifications resolve this
problem by stack-allocating the Q node in the function that calls the
functions that call acquire and release, and modifying functions to
pass the qnode along. Since Q nodes don't exist in the ticket lock API
such changes are unnecessary for ticket locks.

other
edit·good note0Updated 8 days ago by Srivatsa S. Bhat and Robert Morris
followup discussionsfor lingering questions and comments
Resolved   Unresolved
Srivatsa S. Bhat
 Srivatsa S. Bhat 8 days ago
Update: It turns out that the Linux kernel has been using scalable (or non-collapsing) locks for quite some time now. One of the earliest efforts to fix the performance of ticket-based spinlocks dates back to 2013[1], and appears to have used the paper we read today as motivation. (But that particular patchset never actually made it to the mainline kernel). MCS locks found their way into the mutex-lock implementation at around the same time[2].
 
However, replacing ticket-spinlocks with scalable locks turned out to be a much harder problem because locking schemes such as MCS would bloat the size of each spin-lock, which was undesirable due to additional constraints on the spin-lock size in the Linux kernel. A few years later, the Linux developers came up with “qspinlocks” (which use MCS underneath, with special tricks to avoid bloating the size of the spin-lock) to replace ticket-spinlocks, which is now the default spin-lock implementation in the Linux kernel since 2015[3][4]. You may also find this article[5] (written by one of the contributors to the Linux locking subsystem) quite interesting in this context. Finally, the old and unused ticket-spinlock implementation was deleted from the codebase in 2016[6].
 
[1]. Fixing ticket-spinlocks:
      https://lwn.net/Articles/531254/
      https://lwn.net/Articles/530458/
 
[2]. MCS locks used in the mutex-lock implementation:
      http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=2bd2c92cf07cc4a
 
[3]. MCS locks and qspinlocks:
      https://lwn.net/Articles/590243/
 
[4]. qspinlock (using MCS underneath) as the default spin-lock implementation:
      http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=a33fda35e3a765
 
[5]. Article providing context and motivation for various locking schemes:
      http://queue.acm.org/detail.cfm?id=2698990
 
[6]. Removal of unused ticket-spinlock code from the Linux kernel:
      http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=cfd8983f03c7b2