6.824 Lecture 2: RPC and threads Outline RPC Threads RPC in yfs RPC semantics RPC a stylized version of client/server communication that attempts to make remote procedure calls look like ordinary procedure calls. draw picture with stub, request, stub on server, reply key properties: easy to write programs with model programmers are familiar with good match for many distributed applications (client/server) hides details (e.g., marshaling/unmarshaling) alternatives? directly programming with sockets distributed-shared memory (later in the class) map/reduce dryad MPI ... RPC seems to have found usages XML RPC Java RMI Sun RPC map/reduce + dryad implemented using RPC? Key challenges: the semantics in the face of: communication failures (messages may be delayed, variable round trip, never arrive) machines failures did server fail just before the processing the request or just after? sometimes impossible tell the difference between communication failures and machine failuers Thread is short for thread of control, a running program with its own program counter, stack pointer, etc. (For this class a process is a one of more threads executing in a single address space.) Primary purpose: a way of running code concurrently within a single process. For example, in lab 1, if one client is waiting for lock a, the server may want to process requests from other clients, in particular ones for different locks. The primary reasons to use concurrent programming with threads: exploit several processors to run an application faster hide long delays (e.g., while waiting for a disk do something else on processor) run long-running ops concurrenty with short ones in user interfaces network servers and RPC At a minimum, a thread interface must support: creating and managing threads ways of avoiding race conditions for updates to shared variables assume each treads runs on its own processor, sharing a memory instructions that appear to be atomic, might not be (e.g., x = x + 1) ways of coordinating different threads Pthread interface standard interface we use it in the labs Interface threads create join mutex condition variables More in next lecture. Pitfall of multithreaded programming race condition; can you give me an example? may be difficult to reproduce deadlock; can you give me an example? better bug to have than race; you program stops when it happens wrong lock granularity; can you give me an example? starvation; can you give me an example? YFS RPC library how to use: lock_demo.cc, lock_client.c, lock_smain.cc, and lock_server.cc lock_client.cc calls bind() first---why? (we will see) rpc.h: the interface to the RPC system; let's look at it briefly. rpcc (+caller) and rpcs marshaling rpc.cc: the implementation. rrpcc creates two threads clock_loop: retransmissions (you will have to do this) chan_loop: waiting for replies on a channel rpcc::bind a remot procedure to get a unique ID from server rpcc::call1: an RPC! must bind first why lock m? why unlock m before call1 has completed? lock(&ca.m); why? lock(_timeout_lock); why? what is "pthread_cond_signal(&_time_cond)"? what is "pthread_cond_wait(&ca.c, &ca.m)"? rpcc::got_reply: a reply pthread_cond_broadcast(&ca->c)? how many threads? what is the documented race about? rpcc::clock_loop what is going on here? rpcs::rpcs another thread, loop, which gets messages rpcs::loop a new thread for each request can we call pthread methods in RPC handlers? (yes, they are threads) rpcs::dispatch: why can unlocked before the end? is it safe? h->fn(args, fn) invokes the requested procedure checkduplicate_and_update: you must implement this switch statement: how can easy case happen? what should checkduplicate_and_update do? chan.cc uses tcp; why? Possible RPC semantics: At-least-once (last year's 6.824 lab) At-most once (this year in 6.824 and RPC paper) how does the paper achieve at-most-once? At-least-once versus at-most-once? let's take an example: acquiring a lock if client and server stay up, client receives lock if client fails, it may have the lock or not (server needs a plan!) if server fails, client may have lock or not at-least-once: client keeps trying at-most-once: client will receive an exception what does a client do in the case of an exception? need to implement some application-specific protocol ask server, do i have the lock? server needs to have a plan for remembering state across reboots e.g., store locks on disk. at-least-once (if we never give up) clients keep trying. server may run procedure several times server must use application state to handle duplicates if requests are not idempotent but difficult to make all request idempotent e.g., server good store on disk who has lock and req id check table for each requst even if server fails and reboots, we get correct semantics What is right? depends where RPC is used. simple applications: at-most-once is cool (more like procedure calls) more sophisticated applications: need an application-level plan in both cases not clear at-once gives you a leg up => Handling machine failures makes RPC different than procedure calls YFS RPC versus RPC in paper Both at-most-once Using the same technique (bind and exchange a nonce) Protocols differ YFS runs on reliable transport