The SETI@Home Protocol
March 6, 2000
Speaker: David Molnar
Scribe: David Andersen
Handout
Protocol Parties
- Server: The SETI@Home serfver doles out work units and keeps track
of who has done what work under which names.
- Client: The client performs work given to it by the server and
then sends back the work unit.
Primitives
We assume we have a collision-resistant hash function, which we can
denote h(). We further assume that we have the ability to make
a transcript of a computation.
Protocol
- Server sends work unit to client.
- Client performs work on work unit. Client creates transcript S
of computation.
- Client computes h(S).
- Client sends result of work unit and h(S) to the server.
- Server flips a coin at random to decide whether to audit
client. Alternatively, the server selects a random subset of clients
to audit and checks whether this client is in that set.
- If server decides to audit Client, then server re-does
computation on work unit, creates a transcript S', and then computers
h(S'). Alternatively, server asks Client to send a transcript S' for
the specified work unit. The client sends a transcript, which the
server checks for correctness, and then computes h(S').
- Server checks to see whether h(S) = h(S'). If they do then
client is considered compliant. If they do not, then client is
considered cheating.
Questions I have
What are the exact requirements for the transcript? Paul Crowley on
sci.crypt suggested that producing a fake transcript must be at least
as expensive as producing a real one. In addition, we also want that
a fake transcript can be identified as fake on inspection. What else?
Why do we have the hashes at all? Why doesn't the server just check
the work directly?
What other kinds of problems will this approach help? What problems
seem similar but will not be helped?
Discussion
With the questions raised by the handout in mind, David Molnar from
Harvard summarized the SETI problem:
SETI@Home
- SETI is a distributed system to process radiotelescope data
- Several modified clients exist, which are faster, but less
accurate. The authors of the patches for the clients
have not released the source code to their patches.
The clients may have numeric inconsistency.
- One patch floating around the net
- Microsoft released an NT version to improve their ranking.
- SETI asked both parties to stop distributing their modified
clients.
One very-discussed solution to the clients is to remove the incentive
for modified clients by eliminating rankings. Some other
projects have taken this approach, but it's also a useful tool
for building and maintaining user interest. Unfortunately,
removing rankings does nothing to combat actively malicious
users who wish to forge or hide data.
Transcription
What kind of transcript can be used?
- A stack dump every time things change
This is certainly a complete proof, but it's infeasible.
- What is the smallest possible proof?
(Modular exponentiation as an example)
- FFT question - is there a hunk of intermediate state in the FFT
computation which would be a good transcript?
Other verification
Can we detect clients based on their long-term history as a weight?
Can we perform early detection, and what do we do if too many
(2/3s or more) of the clients are taken over?
- Many clients can verify the operation
- We're doing OK if we can force the clients to behave
indistinguishably
- Auditing: Can we find an appropriate level?
- Can we find a work unit which allows the server to distinguish
between good and bad clients? None found so far.
- Axe the rankings
- Just farm out duplicate blocks
- There are some fundamental design issues about which people
quarrel
- Deluding people about the success probability (will it
succeed?)
- Searching in the wrong place for aliens?
- Trusting people for the computation
- Lack of data doesn't say much, because they might have
missed it or been deceived
- But it's a crap-shoot - there are Tb of data flowing
in, so you might as well do something with it. If you
win, you win big.
- Any relationship to QuakeWorld? Not really - there's no time for
auditing in Quake.
- RC5 and DES cracking projects have similar problems.
- Online banking possibly does as well
- Possible to get probabilistically high security by farming out
duplicates to people (or transcript checks - same idea)
and making sure you'll detect at least some of the duplicity.
Then go back and filter results from evil clients.
- Possibly related:
- Any relationship to private information retrieval? Strofsky has
shown private queries logarithmic in the size of the DB (!).
- Hash cash (from 857) - can throw a big computer at it and make
the currency for the month.
- Chameleon hash functions - a secret key allows you to find a collision.
Brought to you by the MIT LCS Applied Security Reading
Group