Reimplementing the Cedar File System Using Logging and Group Commit Robert Hagmann SOSP 1987 What are the main properties the authors want? Fast crash recovery. What happens when you create a file in FSD? 1. get two free pages for leader and data from VAM. 2. update the name table b-tree in memory 3. write leader+data to disk synchronously 4. append modified name table pages to the log in memory What does the in-memory log look like? Probably just a list of modified pages. When is the in-memory log forced to the disk? Group commit... Every few seconds. What does the on-disk log look like? Fixed area of disk, not a file, no free list for log blocks. Proper entry: hdr, blank, hdr copy, data pages..., end, copies of data..., end copy. When are the modified disk cache pages written to the disk? Is it OK to write them before the corresponding log records? Is it OK to write them after? What happens during recovery after crash+reboot? How does recovery s/w find the start of the log? Disk has pointer to first log of oldest third, updated on entry to third. How does recovery find the last log record? hdr/end blocks must have time-stamp or something. What if the system crashed while writing to the log on disk? Do we assume disk wrote pages of log entry in order? If end block exists and matches hdr block, we're OK. What does recovery actually do with the log records? Can it just start re-play at the beginning? Where does it stop re-playing? What about recovering the free list (VAM)? Why does log allow atomic update to complex file name table b-tree? What if we crash during the log write? What can we say about the state of the file system after crash+recovery? Some prefix of the operations have been completed. I.e. every operation up to a certain point. May have lost the last few operations. Why don't they log data writes? When do they write the data? Does that break the prefix rule? May have writes from "the future", after the recovery point. What about delete, create, write, crash? W/ recovery point before delete? What if we run out of log space during normal operation? How can we be sure it's safe to start re-using the log space on disk? What if crash while writing blocks during switch to a new 1/3? Why is FSD faster than CFS? ****** Let's talk about the old system, CFS, first. In some ways it's the more interesting of the two. What were some strengths of the old design? Labels provided robustness. What were the weaknesses of the old design? Non-standard hardware. Slow recovery. Slow meta-data update. CFS file name table was a b-tree. *not* consistent after a crash. thus really required to duplicate names &c in headers. How did the labels work? What was in a label? When written? when checked? What errors can the labels detect? What invariants could a label help maintain? Disk hardware failures: For each, detect bad data? Recover? If disk scribbles onto the file name table? If disk scribbles onto a header? If disk scribbles onto file content? Software errors: If used page on free list, will we re-use it incorrectly? If someone writes junk into header, will we detect? Can we recover? If someone writes junk into file content, will we detect? Why was the CFS VAM just hints? VAM is disk page free list. How could there be no invariants? What if an allocated page is on the free list? Does the FSD VAM have invariants? Why is guarding against single sector errors reasonable? What's the logic behind this design focus? What are the alternatives? What if most disk errors occured in disk electronics? Do they have hierarchical directories? Only one "directory", the file name table...