This lab will familiarize you with multithreading. You will implement switching between threads in a user-level threads package, use multiple threads to speedup a program, and implement a barrier.
Before writing code, you should make sure you have read "Chapter 6: Scheduling" from the xv6 book and studied the corresponding code.
To start the lab, switch to the trap branch:
$ git fetch $ git checkout thread $ make clean
In this exercise you will design the context switch mechanism for a user-level threading system, and then implement it. To get you started, your xv6 has two files user/uthread.c and user/uthread_switch.S, and a rule in the Makefile to build a uthread program. uthread.c contains most of a user-level threading package, and code for three simple test threads. The threading package is missing some of the code to create a thread and to switch between threads.
Your job is to come up with a plan to create threads and save/restore registers to switch between threads, and implement that plan.
$ make qemu ... $ uthread thread_a started thread_b started thread_c started thread_c 0 thread_a 0 thread_b 0 thread_c 1 thread_a 1 thread_b 1 ... thread_c 99 thread_a 99 thread_b 99 thread_c: exit after 100 thread_a: exit after 100 thread_b: exit after 100 thread_schedule: no runnable threads $
This output comes from the three test threads, each of which has a loop that prints a line and then yields the CPU to the other threads.
At this point, however, with no context switch code, you'll see no output.
You should complete thread_create to create a properly initialized thread so that when the scheduler switches to that thread for the first time, thread_switch returns to the function passed as argument func, running on the thread's stack. You will have to decide where to save/restore registers. Several solutions are possible. You are allowed to modify struct thread. You'll need to add a call to thread_switch in thread_schedule; you can pass whatever arguments you need to thread_switch, but the intent is to switch from thread t to the next_thread.
(gdb) file user/_uthread Reading symbols from user/_uthread... (gdb) b thread.c:60
This sets a breakpoint at a specified line in thread.c. The breakpoint may (or may not) be triggered before you even run uthread. How could that happen?
Once your xv6 shell runs, type "uthread", and gdb will break at line thread_switch. Now you can type commands like the following to inspect the state of uthread:
(gdb) p/x *next_threadWith "x", you can examine the content of a memory location:
(gdb) x/x next_thread->stack
You can single step assembly instructions using:
On-line documentation for gdb is here.
In this assignment you will explore parallel programming with threads and locks using a hash table. You should do this assignment on a real computer (not xv6, not qemu) that has multiple cores. Most recent laptops have multicore processors.
The file notxv6/ph.c contains a simple, incorrect parallel hash table. Compile this file on your machine and run it:
$ make ph $ ./ph 2Note that to build ph the Makefile use your OS' gcc; not the 6.S081 tools. The argument 2 to ph specifies the number of threads that execute put and get operations on the the hash table.
After running for a little while, the program will produce output similar to this:
1: put time = 0.003338 0: put time = 0.003389 0: get time = 7.684335 0: 17480 keys missing 1: get time = 7.684335 1: 17480 keys missing completion time = 7.687856
Each thread runs in two phases. In the first phase each thread puts NKEYS/nthread keys into the hash table. In the second phase, each thread gets NKEYS from the hash table. The print statements tell you how long each phase took for each thread. The completion time at the bottom tells you the total runtime for the application. In the output above, the completion time for the application is about 7.7 seconds. Each thread computed for about 7.7 seconds (~0.0 for put + ~7.7 for get).
To see if using two threads improved performance, let's compare against a single thread:
$ ./ph 1 0: put time = 0.004073 0: get time = 6.929189 0: 0 keys missing completion time = 6.933433The completion time for the 1 thread case (~7.0s) is slightly less than for the 2 thread case (~7.7s), but the two-thread case did twice as much total work during the get phase. Thus the two-thread case achieved nearly 2x parallel speedup for the get phase on two cores, which is very good.
When you run this application, you may see no parallelism if you are running on a machine with only one core or if the machine is busy running other applications.
Two points: 1) The completion time is about the same as for 2 threads, but this run did twice as many gets as with 2 threads; we are achieving good parallelism. 2) The output for 2 threads says that many keys are missing. In your runs, there may be more or fewer keys missing. If you run with 1 thread, there will never be any keys missing.
Why are there missing keys with 2 or more threads, but not with 1 thread? Identify a sequence of events that can lead to keys missing for 2 threads.
To avoid this sequence of events, insert lock and unlock statements in put and get so that the number of keys missing is always 0. The relevant pthread calls are (for more see the manual pages, man pthread):
pthread_mutex_t lock; // declare a lock pthread_mutex_init(&lock, NULL); // initialize the lock pthread_mutex_lock(&lock); // acquire lock pthread_mutex_unlock(&lock); // release lock
Test your code first with 1 thread, then test it with 2 threads. Is it correct (i.e. have you eliminated missing keys?)? Is the two-threaded version faster than the single-threaded version?
Modify your code so that get operations run in parallel while maintaining correctness. (Hint: are the locks in get necessary for correctness in this application?) (You can ignore memory-ordering issues.)
Modify your code so that some put operations run in parallel while maintaining correctness. (Hint: would a lock per bucket work?)
In this assignment you implement a barrier using condition variables provided by the pthread library. A barrier is a point in an application at which all threads must wait until all other threads reach that point too. Condition variables are a sequence coordination technique similar to xv6's sleep and wakeup.
You should do this assignment on a real computer (not xv6, not qemu).
The file notxv6/barrier.c contains a broken barrier.
$ make barrier $ ./barrier 2 barrier: notxv6/barrier.c:42: thread: Assertion `i == t' failed.The 2 specifies the number of threads that synchronize on the barrier ( nthread in barrier.c). Each thread sits in a tight loop. In each loop iteration a thread calls barrier() and then sleeps for some random number of microseconds. The assert triggers, because one thread leaves the barrier before the other thread has reached the barrier. The desired behavior is that all threads should block until nthreads have called barrier.
Your goal is to achieve the desired behavior. In addition to the lock primitives that you have seen in the ph assignment, you will need the following new pthread primitives (see man pthreads for more detail):
pthread_cond_wait(&cond, &mutex); // go to sleep on cond, releasing lock mutex pthread_cond_broadcast(&cond); // wake up every thread sleeping on cond
We have given you barrier_init(). Your job is to implement barrier() so that the panic won't occur. We've defined struct barrier for you; its fields are for your use.
There are two issues that complicate your task:
Test your code with one, two, and more than two threads.
This completes the lab. Make sure you pass all of the make
grade tests. If this lab had questions, don't forget to write up your
answers to the questions in answers-lab-name.txt. Commit your changes
(including adding answers-lab-name.txt) and type make handin in the lab
directory to hand in your lab.
Create a new file, time.txt, and put in it a single integer, the
number of hours you spent on the lab. Don't forget to git add and
git commit the file.
Submit the lab
You will turn in your assignments using
website. You need to request once an API key from the submission
website before you can turn in any assignments or labs.
This completes the lab. Make sure you pass all of the make grade tests. If this lab had questions, don't forget to write up your answers to the questions in answers-lab-name.txt. Commit your changes (including adding answers-lab-name.txt) and type make handin in the lab directory to hand in your lab.
Create a new file, time.txt, and put in it a single integer, the number of hours you spent on the lab. Don't forget to git add and git commit the file.
After committing your final changes to the lab, type make handin to submit your lab.
$ git commit -am "ready to submit my lab" [util c2e3c8b] ready to submit my lab 2 files changed, 18 insertions(+), 2 deletions(-) $ make handin tar: Removing leading `/' from member names Get an API key for yourself by visiting https://6828.scripts.mit.edu/2020/handin.py/ Please enter your API key: XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 100 79258 100 239 100 79019 853 275k --:--:-- --:--:-- --:--:-- 276k $make handin will store your API key in myapi.key. If you need to change your API key, just remove this file and let make handin generate it again (myapi.key must not include newline characters).
If you run make handin and you have either uncomitted changes or untracked files, you will see output similar to the following:
M hello.c ?? bar.c ?? foo.pyc Untracked files will not be handed in. Continue? [y/N]Inspect the above lines and make sure all files that your lab solution needs are tracked i.e. not listed in a line that begins with ??. You can cause git to track a new file that you create using git add filename.
If make handin does not work properly, try fixing the problem with the curl or Git commands. Or you can run make tarball. This will make a tar file for you, which you can then upload via our web interface.
The user-level thread package interacts badly with the operating system in several ways. For example, if one user-level thread blocks in a system call, another user-level thread won't run, because the user-level threads scheduler doesn't know that one of its threads has been descheduled by the xv6 scheduler. As another example, two user-level threads will not run concurrently on different cores, because the xv6 scheduler isn't aware that there are multiple threads that could run in parallel. Note that if two user-level threads were to run truly in parallel, this implementation won't work because of several races (e.g., two threads on different processors could call thread_schedule concurrently, select the same runnable thread, and both run it on different processors.)
There are several ways of addressing these problems. One is using scheduler activations and another is to use one kernel thread per user-level thread (as Linux kernels do). Implement one of these ways in xv6. This is not easy to get right; for example, you will need to implement TLB shootdown when updating a page table for a multithreaded user process.
Add locks, condition variables, barriers, etc. to your thread package.