Required reading: I/O-lite.
The design of a buffer management system impacts the performance of servers. For example, consider the design of a Web server on V6 (assuming that V6 had networking support). An HTTP get request goes through the following subsystems:
If we look at this picture we see that the data is in 3 places: file system cache, Web server cache, and network buffers. We also see that the data is copied 3-4 times: disk to kernel, kernel to to server's address space, back into the kernel, and onto the network (which may involve a copy to the network card). The two necessary copies are from disk to kernel and from kernel to network.
IO-lite is a unified buffer and cache system. Its design allows applications to avoid redudant copies, multiple buffering, and enables cross-subsystem optimizations (e.g., caching Internet checksum).
The key idea is to provide an IPC interface that scatter/gathers pages of memory and transfers them between address space by reference, and showing how that IPC interface can be used to avoid redundant copies, multiple buffering, and cross-subsystem optimizations. More specifically:
Why do you need aggregator? Couldn't you send a page maping from the source to destination address space and make the page COW? It is not general enough. For example, the destination address may want to add a header (e.g., the HTTP header) and then forward the mappings to the network server.
Together these ideas allow a buffer to be mapped safely, using the VM system, into multiple address space and shared efficiently among multiple subsystems.
Pseudocode for read:
read(fd, Agg **agg) { // assume 8K read from file system (fileid, offset) = fd->(fileid, offset); cache_agg = lookup_cache(fileid, offset); if (!cache_agg) { allocate cache_agg for in cache allocate buffer(s) from appropriate pool read data into it from the disk. insert cache_agg in cache; } if (!buffer mapped into process) map buffers into process's space. copy cache cache_agg into *agg; }
Pseudocode for write:
write(fd, Agg **agg) { // assume 8K write to file system (fileid, offset) = fd->(fileid, offset); cache_agg = lookup_cache(fileid, offset); update (cache_agg, *agg) } update cache_agg (**agg) { There are three cases: (1) The write modifies every word in the file (2) Only a subset of the file is modified by this write (3) Enough words are modified in the file so that making a redundant copy of the file cache buffer is less expensive than the fragmentation overhead if IO-Lite doesn't. In (1) and (3), allocate a new buffer(s), and write changes to it. In (3) it would also copy over unchanged data. For (2), store the new values to a newly allocated buffer and then combine the new and old values by creating a new buffer aggregate that reflects the logical layout of the file decrease refcnt for buffers that were freedup in update of cache_agg }
How does this apply to 6.828 kernel and operating system? Already, the file server has the buffer cache mapped in memory. When a client opens a file, the server sends already a mapping to the client, which maps a page from the buffer cache into the client's address space. (If we had, a network server, the client can sends a mapping to the network server). Ref counting works out, because the kernel maintains a ref count for each physical page; if the ref count is 1 for a page in the buffer cache, then only the file server has it mapped.
If we want I/O lite functions, we need generalize the IPC interface so that we can send I/O vectors (i.e., scatter/gather structures). This extension would allow intermediate environments to add headers etc., without having to copy.
How many copies? How many bufferings? What cross-subsystem optimizations?
How does IO-lite and paging interact?
What is the page replacement policy?
Is I/O-lite worth it? (See figure 3 and 4) Is it worth for small pages? What limits the performance of I/O-lite? How about Flash and Apache?
How are CGI-bin apps implemented? Is it worth it? (See figures 5 and 6).
Is double buffering a problem? (See figure 7).
Is there any other app than a Web server that benefits (See figure 8).