Final Project Presentation Schedule

Thursday, April 29, 2004

Note: Presentations will be strictly limited to 6.5 minutes each and should begin on time. There will be two projectors setup, so the next group should be done setting up well before their slot begins. We'll cut you off if you run over.

1:05 - 1:11 Prototyping a DHT Oriented Architecture
Maxwell Krohn, Jeremy Stribling and Michael Walfish

We attempt to decouple identity from location in Internet hosts. In our proposal, hosts receive flat identifiers in a large and sparse namespace, and an Internet-wide distributed hash table (DHT) acts as a resolver by mapping these flat identifiers to IP addresses in analogy with today's DNS. Unlike DNS names, the source and destination identifiers appear in packets, in a shim layer after the IP header. Our proposal would change all host software but leave the core routers untouched. The advantages of our proposal are: (1) the benefits of decoupling location from identity and (2) enhanced middlebox functions.

1:12 - 1:18 Automated Testing of Distributed Systems
Nathan Boy, Jared Casper, Carlos Pacheco, Amy Williams

We present a technique to test servers that interact with clients using the Sun RPC protocol. The technique requires the user to provide two things: a list of RPC calls for the server being tested, and a set of invariants that are required to hold over the RPC communications trace. The tester works by generating random sequences of RPC calls and checking that the invariants hold over the tests' traces. If an invariant is violated, the violating sequence of RPC calls is reported to the user. To illustrate the validity of our system, we use it to test a block server and a lock server.

1:19 - 1:25 Global Optimization by Building Better Peer Neighbourhoods in BitTorrent
Asfandyar Qureshi

BitTorrent [1] is popular file-sharing tool, accouting for a surprisingly large proportion of Internet traffic. Files are divided into fragments and transferred out of order among nodes trying to download them. Individual nodes penalize and reward other nodes, adjacent in the BitTorrent overlay, depending on how willing they are to share data. This incentives scheme and the subsequent enforcement of sharing is credited with making BitTorrent outperform other contemporary file-sharing systems.

Nonetheless, BitTorrent builds its overlays by selecting random peers for newly joining nodes, a fact that can seriously handicap both individual performance and waste global network resources. This paper investigates schemes which attempt to build a more intelligent overlay network, particularly using network delays to select overlay peers that are close by in the underlying network. We evaluate our techniques both from the network's perspective (resources used) and from the individual's perspective (average time to complete a download). In both cases, our results show that better peer selection can lead to improved performance with no other changes to the BitTorrent protocol.

1:26 - 1:32 Portable Reputations with EgoSphere
Keith Bonawitz, Chaitra Chandrasekhar, Rui Viana

Many online services require some form of trust between users trust that a seller will deliver for goods as advertised, trust that an author's thoughts are worth the time spent on reading them. To accommodate an internet community where users are constantly interacting with strangers, online services often construct proprietary reputation management systems for their community, with the side effect of locking users into that service if they wish to maintain their reputation. In contrast, this paper outlines EgoSphere, a system for portable Internet reputations, so that reputations built on one service can be used elsewhere. EgoSphere hinges on the use correlational statistics to automatically project reputations from one service to those services which have related reputations. To achieve these goals, EgoSphere must gather reputation information from webservices. EgoSphere avoids the unreasonable expection that all webservices to publish their reputation databases, while also avoiding violating robots.txt restrictions and incurring additional website load, by gathering its reputation data with using a distributed passive robot system which simulates the function of a standard webcrawling robot by using webproxies on users' computers to analyze the responses to standard webservice requests. This paper outlines the design and proof-of-concept implementation of EgoSphere, targeted specifically at providing portable reputations for bulletin-board style internet services.

1:33 - 1:39 DIBS: Distributed Backup for Local Area Networks
Eugene Hsu, Jeff Mellen and Pallavi Naresh

DIBS (Distributed Intranet Backup System) is a distributed file backup system for computers in local area networks that lack dedicated or secondary data storage. DIBS exploits the unused disk space among a set of connected machines and the high speed of local area networks to provide a transparent peer-to-peer backup solution. Each node in the DIBS system contains a module that manages its list of critical files and the files it stores for other nodes, a server that responds to requests from other nodes, and a client interface to perform backup operations. Each node is responsible for its own files, and, given adequate space on the network, that it is replicated on at least one additional machine. The system is designed to provide complete recovery for a crashed node, recovery from transient machine failures, protection for mistaken file overwrites and deletion, and flexibility in the face of changing network-wide storage conditions.

1:40 - 1:46 Peer to Peer Multicast for Live Streaming Video
Maya Dobuzhskaya, Rose Liu, Jim Roewe and Nidhi Sharma

The high bandwidth requirements necessary for serving live streaming video and audio content restrict normal Internet users on cable or dialup modems from distributing such content to multiple viewers. With the growing popularity of hosting live video content, this high bandwidth requirement is becoming a widespread problem. Unlike with pre-recorded video, people viewing live video all receive the same content at approximately the same time. Our system exploits this temporal locality in viewer streams to redistribute the bandwidth load on the server among the clients. We use a peer-to-peer multicast scheme of content distribution to dramatically reduce the server's bandwidth requirement, enabling low bandwidth home users to serve high quality live video to 50-100 clients.

1:47 - 1:53 Robust BitTorrent
Nirav Dave, Albert Huang, Yuan Shen, Edmund Wong

BitTorrent was designed to distribute a single set of data to a large set of nodes quickly. To accomplish this, a client downloads the file from both nodes that have the entire file (the seed) and other downloaders (peers) who have parts of the file. A centralized tracker is used to find other peers. As a result, this tracker is vulnerable to DDoS attacks, since this tracker is statically specified.

Our goal is to make the BitTorrent system more robust against trackerdirected attacks. We intend to accomplish this by adding the ability to dynamically designate new trackers should one fail, and by replicating the tracker k times in order to guarantee a graceful rollover.

Much like Gnutella, nodes connect to the network by means of searching a list of previously observed nodes on the network. Once on the general network, nodes can initiate a search, which indirectly queries a distributed hash table (DHT) to obtain a tracker for the desired file. Through this, a client can connect to the BitTorrent system to download the desired file.

1:54 - 2:00 A Library for Parallel Genetic Algorithms
Ariel Rideout, Ryan Williams and David Ziegler

Genetic Algorithms are favored for optimization problems because they lend themselves to both continuous and discrete combinatorial problems and are less susceptible to getting 'stuck' at local optima in comparison to gradient search methods. Unfortunately, GAs are typically computationally expensive. A distributed toolkit is offered which will allow GAs to be processed across multiple machines. The toolkit ensures an equivalent result as when the problem is approached locally, but significantly diminishes the required computation time. An analysis of the system's projected performance is described.

2:01 - 2:07 WebTorrent: a BitTorrent Extension for High Availability Servers
Gary Sivek, Steven Sivek, Jonathan Wolfe, Michael Zhivich

Achieving content high-availability is one of the most important goals of a webserver system.In order to achieve high-availability in the traditional server-client setting, the server must have the bandwidth and the hardware needed to handle any peak load that might occur. However, this is a very costly and rarely practical solution, especially for servers that are subjected to the Slashdotting effect. We propose a WebTorrent system that is based on BitTorrent and willleverage the resources of the clients to help the server make the content more available. Such a system will alleviate the load on the server and reduce the client download times.

2:08 - 2:14 P2P Objects
Wenyan Dong, Sonam Dorji, Sachin Katti, Dimitris Vyzovitis

In a peer to peer system autonomous systems pool their resources (e.g. files, storage, compute cycles) in order to inexpensively handle tasks that would normally require large costly servers. Distributed applications are then built on top of this which take advantage of the resource scaling (music file sharing, co-operative computation of complex tasks etc). In most such distributed systems communication is essentially one to many. Retrieving a particular object involves finding the set of nodes who hold that object, querying a set of nodes holding particular types of resources means communicating with all those nodes and so on. Thus the problems boil down to efficient object location and grouping of objects based on objects types. Thus a object based multicast primitive is a fundamental building block for almost every distributed application.

DHTs have recently emerged as an elegant and scalable building block for P2P systems. DHTs provide a get and put interface for keys, which provides applications with an efficient routing substrate. But they are designed for exact lookups i.e. given a key a DHT can efficiently find the node responsible for that key. Combining this with a mechanism to efficiently group nodes based on the type of resources they hold and enabling nodes to communicate with these groups using multicast, a powerful and flexible API for application developers can be provided.

In this paper we present an efficient way to build multicast trees based on object types and query them. The multicast group formation algorithm is scalable and has the inherent strengths a DHT has: low control overhead, logarithmic bound on routing etc. We intend to provide an implementation and use it to build a querying mechanism to locate files efficiently based on complex predicates and not just keywords.

2:15 - 2:21 Advanced Backup Service
Joseph Cooley, Alen Peacock, Chris Taylor

Many computer users maintain machines with no backup scheme for protecting their data in the event of a loss or failure. At the same time, many user's computers are likely to contain spare disk space and unused networking resources. We present the Apportioned Backup System (ABS), which provides a reliable collaborative backup resource by leveraging independent, distributed resources. With ABS, procuring and maintaining specialized backup hardware is unnecessary. ABS makes efficient use of network and storage resources through use of coding techniques, convergent encryption, and difference versioning. The system also painlessly accommodates dynamic expansion of system compute, storage, and network resources, and is tolerant of catastrophic node failures.

2:22 - 2:28 An Instant Messager Application based on OpenHash
Tamara Yu, Ryan Manuel, Joanna Liang, Timothy Chan

We propose an OpenHash-based architecture that avoids heavyweight web service tools and that is suitable for building a lightweight instant messaging (IM) application. OpenHash provides the backbone to a distributed storage medium using a simple get() / put() interface. Our design emphasizes scalability and reliability in order to achieve an application that is competitive with existing commercial IM applications.