Lab guidance

Hardness of assignments

Each lab task is tagged to indicate roughly how long we expect the task to take:

Most of the labs require only a modest amount of code (perhaps a few hundred lines per lab part), but can be conceptually difficult and may require a good deal of thought and debugging. Some of the tests are difficult to pass.

Don't start a lab the night before it is due; it's more efficient to do the labs in several sessions spread over multiple days. Tracking down bugs in distributed systems is difficult, because of concurrency, crashes, and an unreliable network.

Tips

Debugging

Efficient debugging takes experience. It helps to be systematic: form a hypothesis about a possible cause of the problem; collect evidence that might be relevant; think about the information you've gathered; repeat as needed. For extended debugging sessions it helps to keep notes, both to accumulate evidence and to remind yourself why you've discarded specific earlier hypotheses.

One approach is to progessively narrow down the specific point in time at which things start to go wrong. You could add code at various points in the execution that tests whether the system has reached the bad state. Or your code could print messages with relevant state at various points; collect the output in a file, and look through the file for the first point where things look wrong.

The Raft labs involve events, such as RPCs arriving or timeouts expiring or peers failing, that may occur at times you don't expect, or may be interleaved in unexpected orders. For example, one peer may decide to become a candidate while another peer thinks it is already the leader. It's worth thinking through the "what can happen next" possibilities. For example, when your Raft code releases a mutex, the very next thing that happens (before the next line of code is executed!) might be the delivery (and processing) of an RPC request, or a timeout going off. Add Print statements to find out the actual order of events during execution.

The Raft paper's Figure 2 must be followed fairly exactly. It is easy to miss a condition that Figure 2 says must be checked, or a state change that it says must be made. If you have a bug, re-check that all of your code adheres closely to Figure 2.

As you're writing code (i.e., before you have a bug), it may be worth adding explicit checks for conditions that the code assumes to be true, perhaps using Go's panic. Such checks may help detect situations where later code unwittingly violates the assumptions.

If code used to work, but now it doesn't, maybe a change you've recently made is at fault.

The bug is often in the very last place you think to look, so be sure to look even at code you feel certain is correct.

The TAs are happy to help you think about your code during office hours, but you're likely to get the most mileage out of limited office hour time if you've already dug as deep as you can into the situation.