Freenet

nym.alias.net has a major centralized component.
  Makes it a magnet for attacks.
  Napster is similarly centralized and exposed.

What is Freenet's high-level API?
  You give it a key, it gives you a file.
  Allows insertion and retrieval.
  Probably doesn't allow update.
    Maybe by using public keys as keys and signing files.
    Certainly no Bayou-like conflict resolution.
    Focus is on write-once archiving.
  Files are distributed over all participants.
    A file may be widely replicated, or hard to find.
    So focus is really on locating a file, given the key.

Who might use Freenet?
  Just people who want anonymity?
  Might it provide benefits to ordinary folks too?
    Napster-like file sharing -- I can see your MP3 files.
    Allows you to use my huge empty disk drive -- global utility.
    Like running a web server -- but automatic replication,
      load balance, server selection.
      Participants needn't buy expensive server farms.
    As a user, like using a good web cache (maybe).

What are Freenet's detailed goals?
  Scale.
  Availability / robustness.
  Efficiency (find stuff w/o too much work).
  Anonymity for queriers.
  Anonymity for document authors.
  Anonymity for document storers.

What desirable goals does Freenet not support?
  Permanence.
  Conflict resolution (e.g. in index key files).
  Directories. So how do you find stuff?
  
What is a key?
  Unstructured bits from FreeNet's point of view.
    Apps may know it's the hash of name, or content-hash, or pub key.
  No embedded hints to help locate files.
    Unlike, say, http://www.cnn.com/file/name.html

What does a Freenet node do with a query?
  Maybe it has the file associated with that key -- so return it.
  Otherwise find entry closes to key in the node's "routing table."
    Forward the query to that node.

What if a node has nothing in its routing table?
  Returns query to previous hop.

What does a node do if a query is returned to it?
  Trys the next-closest routing table entry.
  This means that nodes keep state about pending queries.

How does freenet prevent loops?
  Each query has a semi-unique ID.
    Key for pending query table.
  A node sends back a query if the ID is already in the query table.

What about queries for non-existent keys?
  Will they visit every node in the universe?
  No: a query has a TTL, max # of nodes a query can visit.
  Guidelines for setting TTL?

How does a Freenet node update its routing table?
  A successful reply retraces path back to originator.
  Reply contains IP address of the node that had the data.
  Each node on path adds doc key + node IP addr to routing table.

How does a Freenet node add entries to its document cache?
  If it requested the document.
  If it was on the return path of someone else's successful query.

How does an insert work?
  First phase:
    Insert message sent along same path as a query for same key.
    TTL really represents the number of copies inserter wants.
  Second phase:
    File sent along same path.
    Every node along the way stores it.
    Every node adds a routing table entry for originator.
  Why two phases?
    Finds collisions early?
    Allows intermediate nodes to verify there won't be a collision?
    Allows inserter to predict if it can find all TTL storers?

What happens if that key already exists?
  Insert is rejected.
  Existing document propagated back towards inserter.

How does the Freenet system find out about new nodes?
  I.e. new places to store data.
  Only by those nodes inserting documents.
  Lame; why not by those nodes querying?

Summarize data structures in a Freenet node:
  Documents, each with its key.
    This might be big!
  Routing table mapping keys to hosts.
  State for outstanding queries (and inserts).
  These are all caches!
    All are fixed size w/ LRU replacement.

What does a Freenet query packet look like?
  Query ID.
  Key.
  TTL.
  Depth.

Why is a Freenet query likely to succeed quickly?
  I.e. take less than time linear in the number of nodes?
  Authors claim nodes will become expert at certain key ranges.
    And that popular documents will be cached in many places.
  Does the evaluation section have much to say about this?
    Figure 3: "All queries take 10 hops". Hmm.

What can we say about how searches explore the key space?
  *Not* in order of best routing table entry seen so far.
    Really depth first in a random tree.
  Suppose looking for document 10 in this net:
    Src knows 11, 12.
    11 knows 20, 30.
    12 knows 10.
  Then search will visit 20 and 30 before 12.
    Even though search "knew about" 12 before it saw 20 or 30.

Why is a Freenet query likely to succeed if the document exists?
  Well, maybe it isn't.
  It might if the document is globally popular.
  Depending on TTLs and access patterns,
    might not even be able to find a copy of a popular document.

Will Freenet preserve documents?
  No. Only if they are requested a lot.

How does Freenet replicate documents?
  Perhaps every node along a reply's path caches the document?

Does Freenet use the network efficiently?

How does a Freenet user know the data is authentic?
  Can't in the default key scheme.
  Could encrypt file with name (rather than key).
    Then it's hard to make up reasonable fake contents.
    But could do it for specific target files w/ known names.
  Easier with content-hash or public key.

How is the anonymity of queriers preserved?
  The origin of a query is not needed except at first hop.
    Even first hop doesn't know previous was originator.
    What about depth field?
  So can be faked or omitted from query packets.
  Why would we care?

How is the anonymity of inserters preserved?
  Same deal.
  Why would we care?

How is the anonymity of storage locations preserved?
  Nodes along return path need to update routing tables.
    With ID of original node.
    This sounds bad for anonymity.
  So nodes sometimes replace location in response with own IP address.
    Makes routing less efficient.

Suppose someone posts bomb construction secrets to Freenet.
  And DoE wants to delete them.
  What can it do?
  Can it decide reliably which nodes contain the document?
    And physically confiscate 100% of those nodes?
    Why is it hard to decide if a node stores a document?
      Can't just query it.
      Can't query with TTL = 1.
      Could inspect all packets in and out, perhaps.
        Requires physical access anyway.
  Can the DoE confiscate random nodes,
    check if their disks contain secrets,
    and put their owners in jail?
    Maybe. Could encrypt contents so storers could deny knowledge.

Do we really need storage anonymity?
  Is it enough to make sure copies are widely distributed?
    E.g. in other countries.

Does Freenet provide as strong anonymity as, say, e-mail anonymizers?
  No -- a local eavesdropper will know it's me, and that I'm
    asking for a particular document.
  Eavesdropper can link together incoming and outgoing requests.

Do we really need to intertwine routing and querier/inserter anonymity?
  No -- could probably separate them.
  Paper itself suggests a separate "pre-routing" phase.
    Using nested public key encryption of query.

What if an attacker floods Freenet with junk?
  Exhaust node disk storage.
  Cause lots of routing table entries to point to attacker.

How could Freenet support updates?
  Can't do it with simple keys -- open to bad guys replacing my files.
  Could name file with a public key (the *file*'s public key).
    Sign file with private key.
  Then update supplies new file signed with same key.
    So nodes know they are allowed to accept it.
  Freshness is still hard:
    No way to locate all the replicas of a file!

How could Freenet support directories?
  Easy if directory is write-once.
  Updates OK if using public keys.

So how can people actually *use* Freenet?
  Weird names, awkward support for directories and updates.

Could you emulate the Web with Freenet?
  Use Freenet keys as embedded links.