"Flash...", by Pai, Druschel, and Zwaenepoel USENIX 1999 Last paper involved serious O/S mods. Point: efficient concurrent programming, but non-blocking disk reads. Today, much the same benefits w/o any O/S mods! Why is cached vs disk a big issue? Where might the cache be? O/S disk block cache, or Maintained by web server[s] Flash seems to use both. But Flash-MP has *mapped* file cache, so shared? What's good or bad about just using the O/S disk block cache? easy to share. expensive to get at. What's good or bad about maintaining a cache in the web server? may be hard to arrange sharing. but may be faster. Why is portability a challenge? can't depend on sophisticated O/S interfaces is this a red herring? why not just figure out the right interfaces? What is the basic AMPED idea? main event-driven process, helper processes for blocking disk ops What operations does the helper perform? disk read() how about open()? stat()? How does the main process ask a helper to read data? pipe? How does the helper tell the main process the read is done? How does the main process actually get file data? After the helper causes it to be read in? it's implicit -- main process mmaps file, assumes pages will be resident What is mmap()? mincore()? What if more requests than helpers? don't know. presumably queued by main process. What's a reasonable number of helpers? one or two per disk? Is AMPED a good idea -- is this the best way to structure servers? yes -- O/S does not allow many good options no -- why not fix the O/S We're familiar with most of the techniques AMPED is compared to SPED, MP, MT Note MT means kernel threads Matrix of techniques and workloads -- where do we expect good/bad performance? From Cache From Disk SPED ++ - MP - + MT + + AMPED ++ + What do we *expect* the speedup to be? What limits AMPED's speed if all requests are in cache? What limits AMPED's speed if all requests go to disk? Why exactly would it be faster than SPED? Aren't both disk-limited? So maybe we expect AMPED to win with mixed cache/disk workload. Can we guess what cache/disk mix real Web workloads would have? What would cause a workload to have any particular mix? Is it likely we could make up a realistic workload? i.e. one that would reasonably predict relative performance What do they use for workloads? What's the test setup? Real server and lots of clients. How many clients? Is one enough? Clients run fake web browsers that issue concurrent requests. Why does file size affect b/w in Figure 6/7? What can we conclude about AMPED from Figure 6/7? Why does FreeBSD get much higher performance than Solaris on the same hardware? What can we conclude from Figure 8? Realistic traces. Flash is a bit faster, but not radically. Presumably this is a mix of cached/disk requests. But actual mix is not known, so we don't really know what we're testing. How do figures 9 and 10 shed light on cached/disk performance? by varying data set size, control how well data fits in ~100 MB disk cache. How do they vary the data set size? How does that affect cache vs disk? Figure 9/10, Flash vs MP. Why does Flash beat MP for small data set? (MP has partitioned cache) Why does Flash beat MP for large data set? (event-driven is more efficient) Flash vs SPED Why are Flash and SPED close for small data set? Why does Flash beat SPED for large data set? Flash vs MT Flash and MT have about the same behavior for all data set sizes. Why? What does this mean w.r.t. whether Flash is worthwhile? What was the disk system? Multiple disks? How many client machines? Processes? Simultaneous requests? Why was Solaris slower?