XXX did we do something special to qemu/xv6 configuration to get contention. Athena?

Lab: locks

In this lab you'll gain experience in re-designing code to increase parallelism. A common symptom of poor parallelism on multi-core machines is high lock contention. Improving parallelism often involves changing both data structures and locking strategies in order to reduce contention. You'll do this for the xv6 memory allocator and block cache.

Before writing code, you should make sure you have read "Chapter 4: Locking" from the xv6 book and studied the corresponding code.

  $ git fetch
  $ git checkout lock
  $ make clean

Memory allocator

The program user/kalloctest stresses xv6's memory allocator: three processes grow and shrink their address spaces, resulting in many calls to kalloc and kfree. kalloc and kfree obtain kmem.lock. kalloctest prints the number of test-and-sets that did not succeed in acquiring the kmem lock (and some other locks), which is a rough measure of contention:

$ kalloctest
start test
test results:
--- lock kmem/bcache stats
lock: kmem: #fetch-and-add 68573 #acquire() 433012
lock: bcache: #fetch-and-add 0 #acquire() 1246
--- top 5 contended locks:
lock: kmem: #fetch-and-add 68573 #acquire() 433012
lock: proc: #fetch-and-add 23529 #acquire() 214537
lock: virtio_disk: #fetch-and-add 22673 #acquire() 111
lock: proc: #fetch-and-add 6594 #acquire() 214541
lock: proc: #fetch-and-add 4936 #acquire() 214596
tot= 68573
test FAIL

acquire maintains, for each lock, the count of calls to acquire for that lock, and the count of test-and-sets that didn't manage to acquire the lock. kalloctest calls a system call that causes the kernel to print those counts for the kmem and bcache locks (which are the focus of this lab) and for the 5 most contended locks. If there is lock contention the number of test-and-sets will be high because it takes many loop iterations before acquire obtains the lock. The system call returns the sum of the #test-and-sets for the kmem and bcache locks.

For this lab, you must use a dedicated unloaded machine with multiple cores. If you use a machine that is doing other things, the test-and-set counts that kalloctest prints will be nonsense. You can use a dedicated Athena workstation, or your own laptop, but don't use a dialup machine.

The root cause of lock contention in kalloctest is that kalloc() has a single free list, protected by a single lock. To remove lock contention, you will have to redesign the memory allocator to avoid a single lock and list. The basic idea is to maintain a free list per CPU, each list with its own lock. Allocations and frees on different CPUs can run in parallel, because each CPU will operate on a different list. The main challenge will be to deal with the case in which one CPU's free list is empty, but another CPU's list has free memory; in that case, the one CPU must "steal" part of the other CPU's free list. Stealing may introduce lock contention, but that will hopefully be infrequent.

Your job is to implement per-CPU freelists and stealing when one CPU's list is empty. Run kalloctest to see if your implementation has reduced lock contention, and to check that it can still allocate all of memory run usertests sbrkmuch. Your output will look similar to that shown below, although the specific numbers will differ. Also run usertests to make sure all tests in usertests still pass.

$ kalloctest
start test1
test1 results:
--- lock kmem/bcache stats
lock: kmem: #fetch-and-add 0 #acquire() 45384
lock: kmem: #fetch-and-add 0 #acquire() 190932
lock: kmem: #fetch-and-add 0 #acquire() 196754
lock: bcache: #fetch-and-add 0 #acquire() 18
lock: bcache.bucket: #fetch-and-add 0 #acquire() 22
lock: bcache.bucket: #fetch-and-add 0 #acquire() 10
lock: bcache.bucket: #fetch-and-add 0 #acquire() 10
lock: bcache.bucket: #fetch-and-add 0 #acquire() 18
lock: bcache.bucket: #fetch-and-add 0 #acquire() 14
lock: bcache.bucket: #fetch-and-add 0 #acquire() 280
lock: bcache.bucket: #fetch-and-add 0 #acquire() 10
lock: bcache.bucket: #fetch-and-add 0 #acquire() 14
lock: bcache.bucket: #fetch-and-add 0 #acquire() 8
lock: bcache.bucket: #fetch-and-add 0 #acquire() 8
--- top 5 contended locks:
lock: proc: #fetch-and-add 28445 #acquire() 241837
lock: virtio_disk: #fetch-and-add 9710 #acquire() 57
lock: proc: #fetch-and-add 6785 #acquire() 241840
lock: proc: #fetch-and-add 5418 #acquire() 241896
lock: proc: #fetch-and-add 4093 #acquire() 241900
tot= 0
test1 OK
start test2
total free number of pages: 32496 (out of 32768)
.....
test2 OK
$ usertests sbrkmuch
usertests starting
test sbrkmuch: OK
ALL TESTS PASSED
$ usertests
...
ALL TESTS PASSED
$

Please give all of your locks initlock names that start with "kmem".

Some hints:

Buffer cache

If multiple processes use the file system intensively, they will likely contend for bcache.lock, which protects the disk block cache in kernel/bio.c. bcachetest creates several processes that repeatedly read different files in order to generate contention on bcache.lock; its output looks like this (before you complete this lab):

$ bcachetest
start test0
test0 results:
--- lock kmem/bcache stats
lock: kmem: #fetch-and-add 0 #acquire() 32969
lock: kmem: #fetch-and-add 0 #acquire() 60
lock: kmem: #fetch-and-add 0 #acquire() 69
lock: bcache: #fetch-and-add 11734 #acquire() 65936
--- top 5 contended locks:
lock: virtio_disk: #fetch-and-add 156038 #acquire() 1185
lock: proc: #fetch-and-add 50371 #acquire() 217814
lock: bcache: #fetch-and-add 11734 #acquire() 65936
lock: proc: #fetch-and-add 6660 #acquire() 217482
lock: proc: #fetch-and-add 5218 #acquire() 217494
tot= 11734
test0: FAIL
start test1
test1 OK
You will likely see different output, but the number of test-and-sets for the bcache lock will be high. If you look at the code in kernel/bio.c, you'll see that bcache.lock protects the list of cached block buffers, the reference count (b->refcnt) in each block buffer, and the identities of the cached blocks (b->dev and b->blockno).

Modify the block cache so that the number of test-and-sets for all locks in the bcache is close to zero when running bcachetest. Ideally the sum of test-and-sets of all locks involved in the block cache should be zero, but it's OK if the sum is less than 500. Modify bget and brelse so that concurrent lookups and releases for different blocks that are in the bcache are unlikely to conflict on locks (e.g., don't all have to wait for bcache.lock). You must maintain the invariant that at most one copy of a block is cached. When you are done, your output should be similar to that shown below (though not identical). Make sure usertests still passes.

$ bcachetest
start test0
test0 results:
--- lock kmem/bcache stats
lock: kmem: #fetch-and-add 0 #acquire() 32954
lock: kmem: #fetch-and-add 0 #acquire() 75
lock: kmem: #fetch-and-add 0 #acquire() 73
lock: bcache: #fetch-and-add 0 #acquire() 85
lock: bcache.bucket: #fetch-and-add 0 #acquire() 4159
lock: bcache.bucket: #fetch-and-add 0 #acquire() 2118
lock: bcache.bucket: #fetch-and-add 0 #acquire() 4274
lock: bcache.bucket: #fetch-and-add 0 #acquire() 4326
lock: bcache.bucket: #fetch-and-add 0 #acquire() 6334
lock: bcache.bucket: #fetch-and-add 0 #acquire() 6321
lock: bcache.bucket: #fetch-and-add 0 #acquire() 6704
lock: bcache.bucket: #fetch-and-add 0 #acquire() 6696
lock: bcache.bucket: #fetch-and-add 0 #acquire() 7757
lock: bcache.bucket: #fetch-and-add 0 #acquire() 6199
lock: bcache.bucket: #fetch-and-add 0 #acquire() 4136
lock: bcache.bucket: #fetch-and-add 0 #acquire() 4136
lock: bcache.bucket: #fetch-and-add 0 #acquire() 2123
--- top 5 contended locks:
lock: virtio_disk: #fetch-and-add 173423 #acquire() 1140
lock: proc: #fetch-and-add 50586 #acquire() 325153
lock: proc: #fetch-and-add 11590 #acquire() 324797
lock: proc: #fetch-and-add 8551 #acquire() 324797
lock: proc: #fetch-and-add 7897 #acquire() 324797
tot= 0
test0: OK
start test1
test1 OK
$ usertests
  ...
ALL TESTS PASSED
$

Please give all of your locks names that start with "bcache". That is, you should call initlock for each of your locks, and pass a name that starts with "bcache".

Reducing contention in the block cache is more tricky than for kalloc, because bcache buffers are truly shared among processes (and thus CPUs). For kalloc, one could eliminate most contention by giving each CPU its own allocator; that won't work for the block cache. We suggest you look up block numbers in the cache with a hash table that has a lock per hash bucket.

There are some circumstances in which it's OK if your solution has lock conflicts:

bcachetest's test1 uses more distinct blocks than there are buffers, and exercises lots of file system code paths.

Here are some hints:

Submit the lab

This completes the lab. Make sure you pass all of the make grade tests. If this lab had questions, don't forget to write up your answers to the questions in answers-lab-name.txt. Commit your changes (including adding answers-lab-name.txt) and type make handin in the lab directory to hand in your lab.

Time spent

Create a new file, time.txt, and put in it a single integer, the number of hours you spent on the lab. Don't forget to git add and git commit the file.

Submit

You will turn in your assignments using the
submission website. You need to request once an API key from the submission website before you can turn in any assignments or labs.

After committing your final changes to the lab, type make handin to submit your lab.

$ git commit -am "ready to submit my lab"
[util c2e3c8b] ready to submit my lab
 2 files changed, 18 insertions(+), 2 deletions(-)

$ make handin
tar: Removing leading `/' from member names
Get an API key for yourself by visiting https://6828.scripts.mit.edu/2020/handin.py/
Please enter your API key: XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 79258  100   239  100 79019    853   275k --:--:-- --:--:-- --:--:--  276k
$
make handin will store your API key in myapi.key. If you need to change your API key, just remove this file and let make handin generate it again (myapi.key must not include newline characters).

If you run make handin and you have either uncomitted changes or untracked files, you will see output similar to the following:

 M hello.c
?? bar.c
?? foo.pyc
Untracked files will not be handed in.  Continue? [y/N]
Inspect the above lines and make sure all files that your lab solution needs are tracked i.e. not listed in a line that begins with ??. You can cause git to track a new file that you create using git add filename.

If make handin does not work properly, try fixing the problem with the curl or Git commands. Or you can run make tarball. This will make a tar file for you, which you can then upload via our web interface.

Optional challenge exercises