6.824 Lecture 2: RPC and threads Outline RPC Two RPC implementations: - Birrell RPC - RPC in yfs RPC semantics RPC differences RPC Structure client program client stub --- marshall RPC comm package --- deliver packets over network server stub --- unmarshall server program RPC a stylized version of client/server communication that attempts to make remote procedure calls look like ordinary procedure calls. draw picture with stub, request, stub on server, reply key properties: easy to write programs with model programmers are familiar with good match for many distributed applications (client/server) hides details (e.g., marshaling/unmarshaling) alternatives? directly programming with sockets distributed-shared memory (later in the class) map/reduce dryad MPI ... RPC seems to have found usages XML RPC Java RMI Sun RPC map/reduce + dryad implemented using RPC? Key challenges: the semantics in the face of: communication failures (messages may be delayed, variable round trip, never arrive) machines failures did server fail just before the processing the request or just after? sometimes impossible tell the difference between communication failures and machine failuers Birrell RPC we will see many Birrell papers 3 Mbit/s ethernet hardware is quite slow goals: make rpc == procedure call e.g., no-timeout on calls but server failures breaks RPC == procedure call highly efficient Possible RPC semantics: At-least-once (2008 6.824 lab) At-most once (2009 and 2010 6.824 lab and RPC paper) how does the paper achieve at-most-once? (bind is important!) UID on export Exactly-once? See Argus paper later in semester Naming servers What is Grapevine? (precursor to DNS) Export Import Returns unique ID Birrell RPC protocol for at-most-once semantics Simple calls two packets call identifier: machine ID, process, and seqno unique ID returned by bind/export one outstanding request timeout: resend request long-running procedures on server cause timeout no tear down Separate protocol for complicated calls see figure 4 What semantics: comm failure? machine failure? could a restarted server export the same unique ID? At-least-once versus at-most-once? let's take an example: acquiring a lock if client and server stay up, client receives lock if client fails, it may have the lock or not (server needs a plan!) if server fails, client may have lock or not at-least-once: client keeps trying at-most-once: client will receive an exception what does a client do in the case of an exception? need to implement some application-specific protocol ask server, do i have the lock? server needs to have a plan for remembering state across reboots e.g., store locks on disk. at-least-once (if we never give up) clients keep trying. server may run procedure several times server must use application state to handle duplicates if requests are not idempotent but difficult to make all request idempotent e.g., server good store on disk who has lock and req id check table for each requst even if server fails and reboots, we get correct semantics What is right? depends where RPC is used. simple applications: at-most-once is cool (more like procedure calls) more sophisticated applications: need an application-level plan in both cases not clear at-once gives you a leg up => Handling machine failures makes RPC different than procedure calls YFS RPC versus RPC in paper Both at-most-once Using the same technique (bind and exchange a nonce) Protocols differ YFS runs on reliable transport YFS has no cross-layer optimizations (e.g., piggybacking ACK) YFS allows multiple outstanding requests RPC not integrated with kernel and language Mesa is GC-ed Makes RPC implementation challenging Thread/processes Blocking kernel ops Thread cleanup difficult --- more next lec