6.894 Lab 3: HTTP 1.1 Web Proxy for those who have taken 6.033 lab

Due date: Thursday, Oct 12.

Introduction

This document is for students students who have done 6.033 lab before ONLY.

You have already programmed an asynchronous TCP proxy, the first thing you should do is to convert your program to use the new asynchronous library which should not take you too long to do. You should also make sure that your proxy satisfies all the other requirements we specified for the rest of the class.

To make things challenging for you in this lab, we expect you to implement the persistent connection feature for your HTTP/1.0 proxy (yes, it's still a 1.0 proxy). Persistent connection is formally specified in HTTP/1.1, but many HTTP/1.0 clients (e.g. netscape) already has the persistent connection feature.


Why persistent connections?

Prior to persistent connection, a separate TCP connection was established to fetch each URL. In practice, we notice that a client ususally makes multiple requests to the same server in a short period of time (e.g. fetch multiple inline images) With persistent connection, we allow the client to keep some(less or equal to 2 according to spec) TCP connections open to the server. The advantages are that we can save CPU time(coz we now will open and close less TCP connection) and reduce network congestion (coz we give the persistent connection enough time to explore the state of the network) etc (refer to HTTP/1.1 RFC 2616 RFC 2068.


HTTP/1.0 Persistent Connection

% telnet web.mit.edu 80
Then type
GET /6.033/www/ HTTP/1.0
Connection: Keep-Alive
followed by two carriage returns. See what you get. Notice that the mit web server does not close the connection immediately. You can continue with another request, e.g
GET / HTTP/1.0
but without the Connection: Keep-Alive header. See what you get.

If the HTTP/1.1 server honors the client's persistent connection request (signaled by the Connection: Keep-Alive header), it should respond with a (Connection: Keep-Alive) header in its HTTP response to its client. If for some reason, the server decides not to keep persistent connection but to close immediately after the current response, it should respond with (Connection: close) header.

Now configure your browser to use the proxy, abc.mit.edu:8888 (suppose abc is the machine you are currently using). Use nc to listen on localhost's port 8888.

% nc -l -v -p 8888
Point your browser to http://web.mit.edu and see what you get from 'nc'. How's persistent connection requested by client to a proxy?

Now we will see how a real proxy handles the persistent connection option, we will use squid.lcs.mit.edu:3128 for our sample real proxy. Use nc to listen on local port 8888:

% nc -l -v -p 8888
Now, ask your proxy server for http://abc.mit.edu:8888 (suppose abc.mit.edu is your local machine)
% telnet squid.lcs.mit.edu 3128
GET http://abc.mit.edu:8888 HTTP/1.0
Proxy-Connection: Keep-Alive
Observe the request sent out by squid.lcs.mit.edu to abc.mit.edu

Persistent connection is a "hop-by-hop" option rather than an "end-to-end" one. If the client communicates to the server via a proxy, the persistent connections between client <=> proxy, proxy <=> server are separately negotiated between the corresponding pairs. For example, we could have a scenario that the client `talks' to the proxy via a persistent connection and proxy forwards the client request over a newly opened one-time connection.

Requirements

Your proxy server should accept persistent connection requests from clients and also make persistent connection requests to the server and handle them correctly (e.g. reading multiple HTTP requests/responses from the same connection). For performance reasons, it is more worth while to keep the proxy connection to the client persistent because it is highly likely the client is going to make multiple requests to the same proxy in a short period of time, but you are free to make your own decisions about when to close a persistent connection.