The SETI@Home Protocol

The SETI@Home Protocol
March 6, 2000
Speaker: David Molnar
Scribe: David Andersen

Handout

Protocol Parties

Server: The SETI@Home serfver doles out work units and keeps track of who has done what work under which names.
Client: The client performs work given to it by the server and then sends back the work unit.

Primitives

We assume we have a collision-resistant hash function, which we can denote h(). We further assume that we have the ability to make a transcript of a computation.

Protocol

Server sends work unit to client.
Client performs work on work unit. Client creates transcript S of computation.
Client computes h(S).
Client sends result of work unit and h(S) to the server.
Server flips a coin at random to decide whether to audit client. Alternatively, the server selects a random subset of clients to audit and checks whether this client is in that set.
If server decides to audit Client, then server re-does computation on work unit, creates a transcript S', and then computers h(S'). Alternatively, server asks Client to send a transcript S' for the specified work unit. The client sends a transcript, which the server checks for correctness, and then computes h(S').
Server checks to see whether h(S) = h(S'). If they do then client is considered compliant. If they do not, then client is considered cheating.

Questions I have

What are the exact requirements for the transcript? Paul Crowley on sci.crypt suggested that producing a fake transcript must be at least as expensive as producing a real one. In addition, we also want that a fake transcript can be identified as fake on inspection. What else?

Why do we have the hashes at all? Why doesn't the server just check the work directly?

What other kinds of problems will this approach help? What problems seem similar but will not be helped?

Discussion

With the questions raised by the handout in mind, David Molnar from Harvard summarized the SETI problem:

SETI@Home

SETI is a distributed system to process radiotelescope data
Several modified clients exist, which are faster, but less accurate. The authors of the patches for the clients have not released the source code to their patches. The clients may have numeric inconsistency.
1. One patch floating around the net
2. Microsoft released an NT version to improve their ranking.
3. SETI asked both parties to stop distributing their modified clients.

One very-discussed solution to the clients is to remove the incentive for modified clients by eliminating rankings. Some other projects have taken this approach, but it's also a useful tool for building and maintaining user interest. Unfortunately, removing rankings does nothing to combat actively malicious users who wish to forge or hide data.

Transcription

What kind of transcript can be used?

A stack dump every time things change
This is certainly a complete proof, but it's infeasible.
What is the smallest possible proof?
(Modular exponentiation as an example)
FFT question - is there a hunk of intermediate state in the FFT computation which would be a good transcript?

Other verification

Can we detect clients based on their long-term history as a weight? Can we perform early detection, and what do we do if too many (2/3s or more) of the clients are taken over?

Many clients can verify the operation
We're doing OK if we can force the clients to behave indistinguishably
Auditing: Can we find an appropriate level?
Can we find a work unit which allows the server to distinguish between good and bad clients? None found so far.
Axe the rankings
Just farm out duplicate blocks
There are some fundamental design issues about which people quarrel
- Deluding people about the success probability (will it succeed?)
- Searching in the wrong place for aliens?
- Trusting people for the computation
- Lack of data doesn't say much, because they might have missed it or been deceived
- But it's a crap-shoot - there are Tb of data flowing in, so you might as well do something with it. If you win, you win big.
Any relationship to QuakeWorld? Not really - there's no time for auditing in Quake.
RC5 and DES cracking projects have similar problems.
Online banking possibly does as well
Possible to get probabilistically high security by farming out duplicates to people (or transcript checks - same idea) and making sure you'll detect at least some of the duplicity. Then go back and filter results from evil clients.
Possibly related:
Any relationship to private information retrieval? Strofsky has shown private queries logarithmic in the size of the DB (!).
Hash cash (from 857) - can throw a big computer at it and make the currency for the month.
Chameleon hash functions - a secret key allows you to find a collision.

Brought to you by the MIT LCS Applied Security Reading Group