Freenet: A Distributed Anonymous Information Storage and Retrieval System Clarke, Sandberg, Wiley, and Hong 2000 Brief history of recent P2P systems Napster nodes store their own files register keywords+files with central DB send keyword queries to central DB, get list of files+titles+host download selected file direct from host Napster's central DB was a single point of legal failure Gnutella flood keyword queries over all the nodes you respond if one of your files matches very robust (no central db, lots of paths) very expensive, no-one uses straight Gnutella easy for RIAA to see who is fetching/serving what How is Freenet an improvement over Napster/Gnutella? Efficient: no flooding Scalable: no central DB (and no flooding) Robust: can explore lots of paths, and replicates/caches popular data Anonymity for queriers Anonymity for document authors (inserters) Anonymity for document storers Freenet design overview You give it a unique key, it gives you a file. Does not perform keyword search... Allows insertion and retrieval and update. It stores your data, you don't have to serve it. Harness idle disk space and network b/w Files are distributed over all participants. A file may be widely replicated, or hard to find. So focus is really on locating a file, given the key. Who might use Freenet? Is it a drop-in replacement for Gnutella? Could we use Freenet to serve the web? How do you search or browse in Freenet? What is a key? Unstructured bits from FreeNet's point of view. Apps may know it's the hash of name, or content-hash, or pub key. No embedded hints to help locate files. Unlike, say, http://www.cnn.com/file/name.html Summarize data structures in a Freenet node: Documents, each with its key. This might be big! Routing table mapping keys to hosts. State for outstanding queries (and inserts). These are all caches! All are fixed size w/ LRU replacement. What does a Freenet query packet look like? Query ID. Key. TTL. Depth. [insert only?] How about a Freenet reply packet? Query ID. Status. Ultimate source. Data. What does a Freenet node do with a query? Maybe it has the file associated with that key -- so return it. Otherwise find entry closes to key in the node's "routing table." Forward the query to that node. What if a node has nothing in its routing table? Returns query to previous hop. What does a node do if a query is returned to it? Trys the next-closest routing table entry. This means that nodes keep state about pending queries. How does freenet prevent loops? Each query has a semi-unique ID. Key for pending query table. A node sends back a query if the ID is already in the query table. What about queries for non-existent keys? Will they visit every node in the universe? No: a query has a TTL, max # of nodes a query can visit. Guidelines for setting TTL? How does a Freenet node update its routing table? A successful reply retraces path back to originator. Reply contains IP address of the node that had the data. Each node on path adds doc key + node IP addr to routing table. How does a Freenet node add entries to its document cache? If it requested the document. If it was on the return path of someone else's successful query. How does an insert work? First phase: Insert message sent along same path as a query for same key. TTL really represents the number of copies inserter wants. Second phase: File sent along same path. Every node along the way stores it. Every node adds a routing table entry for originator. Why two phases? Finds collisions early? Allows intermediate nodes to verify there won't be a collision? Allows inserter to predict if it can find all TTL storers? What happens if that key already exists? Insert is rejected. Existing document propagated back towards inserter. What happens when a new node joins? Asks to add itself to a few other nodes' routing tables. Using a more or less random fake key. So it specializes in some part of the key space. How does the Freenet system find out about new nodes? I.e. new places to store data. Only by those nodes inserting documents. Lame; why not by those nodes querying? Why is a Freenet query likely to succeed quickly? I.e. take less than time linear in the number of nodes? Authors claim nodes will become expert at certain key ranges. And that popular documents will be cached in many places. Does the evaluation section support this? What can we say about how searches explore the key space? *Not* in order of best routing table entry seen so far. Really depth first in a random tree. Suppose looking for document 10 in this net: Src knows 11, 12. 11 knows 20, 30. 12 knows 10. Then search will visit 20 and 30 before 12. Even though search "knew about" 12 before it saw 20 or 30. Why is a Freenet query likely to succeed if the document exists? Well, maybe it isn't. It might if the document is globally popular. Depending on TTLs and access patterns, might not even be able to find a copy of a popular document. Will Freenet preserve documents? Popular documents will be cached along query paths. Unpopular documents may die. Does Freenet use the network efficiently? I.e. do queries travel more or less directly to the target? No; many global messages per lookup. Maybe not important. Suppose someone posts bomb construction secrets to Freenet. And DoE wants to delete them. What can it do? Can it decide reliably which nodes contain the document? And physically confiscate 100% of those nodes? Why is it hard to decide if a node stores a document? Can't just query it. Can't query with TTL = 1. Could inspect all packets in and out, perhaps. Requires physical access anyway. Can the DoE confiscate random nodes, check if their disks contain secrets, and put their owners in jail? Maybe. Could encrypt contents so storers could deny knowledge. Can we attack a file by being an "expert" node for its key? Join protocol tries to prevent this. just adds an entry to routing tables along random path so new node will see queries near that key from nodes on that path new node won't have docs, but will soon be expert but new node also may insert -- now has multiple keys in others' tables don't want to let node choose its own initial routing key! could use that to target a particular document why do this at all? why not wait for node to insert? node might still target document by inserting one with similar id How does a Freenet user know the data is authentic? Can't in the simplest file key scheme. Could encrypt file with name (rather than key). Then it's hard to make up reasonable fake contents. But could do it for specific target files w/ known names. Could name files with content-hash (CHK). Could name files with publisher's public key, and name. SSK: signed-subspace key. File key is function of text file name and public key. File encrypted with text file name. File signed with private key. How is the anonymity of queriers preserved? The origin of a query is not needed except at first hop. Even first hop doesn't know previous was originator. What about depth field? So can be faked or omitted from query packets. Why would we care? How is the anonymity of inserters preserved? Same deal. Why would we care? How is the anonymity of storage locations preserved? Nodes along return path need to update routing tables. With ID of original node. This sounds bad for anonymity. So nodes sometimes replace location in response with own IP address. Makes routing less efficient. How is deniability preserved? I.e. servers don't know what they are storing. Contents encrypted with key derived from user-visible name. File key derived by hash of user-visible name. So server cannot read contents w/ just file key. Do we really need storage anonymity? Is it enough to make sure copies are widely distributed? E.g. in other countries. Does Freenet provide as strong anonymity as, say, e-mail anonymizers? No -- a local eavesdropper will know it's me, and that I'm asking for a particular document. Eavesdropper can link together incoming and outgoing requests. Do we really need to intertwine routing and querier/inserter anonymity? No -- could probably separate them. Paper itself suggests a separate "pre-routing" phase. Using nested public key encryption of query. What if an attacker floods Freenet with junk? Exhaust node disk storage. Cause lots of routing table entries to point to attacker. How does Freenet support updates? Can't do it with simple keys -- open to bad guys replacing my files. SSK names files with a public key + text name. Sign file with private key. Then update supplies new file signed with same key. So nodes know they are allowed to accept it. Freshness is still hard: No way to locate all the replicas of a file! How could Freenet support directories? Easy if directory is write-once. Updates OK if using public keys. So how can people actually *use* Freenet? Weird names, awkward support for directories and updates. Could you emulate the Web with Freenet? Use Freenet keys as embedded links.