Frequently Asked Questions for Tweedie's Linux Journaling paper Q: How are logging and journaling different? A: For our purposes they are synonyms, referring to the basic ideas behind both ext3 and xv6's logging. Q: In what ways is ext3 better than the xv6 logging design? A: Perhaps the biggest difference is that ext3 allows concurrent disk write and commit of old transactions while accepting updates into the current transaction. So new system calls can execute and return without waiting for older transactions to finish committing. In contrast, xv6 has just the one transaction, and new system calls can't execute (must block) while the transaction is being written to disk and committed. ext3 only commits every five seconds by default, so that each transaction typically contains the updates of many system calls. This allows most system calls to return immediately after just updating blocks in the cache. And it allows "write absorbtion" in cases where successive system calls modify the same file system blocks; such a block only has to be written to the log once per transaction, not once per system call. In contrast, most system calls in xv6 trigger a commit, and have to wait for the commit to finish. Q: What is meta-data? A: i-nodes, directory content blocks, indirect blocks, and free-block bitmaps. Everything other than file content blocks. Q: How does commit frequency affect filesystem performance? A: Less frequent commits are likely to lead to higher performance. Each commit has some overhead -- new system calls have to be briefly blocked, and the logging system has to write some extra blocks (the descriptor and commit blocks). The more system calls' updates you can fit into each transaction (the lower the commit frequency), the lower the impact of those overheads. Also big transactions offer more scope for write absorbtion (multiple system calls in the same transaction updating the same block, so that the block only has to be logged a single time despite many udpates). Q: What does the paper mean by dependent data? A: The paper's design does not include file content blocks in the log; only metadata blocks are logged (i.e. i-nodes, directory content, indirect blocks, and free block bitmaps). In order to avoid a crash leaving a newly written file referring to content blocks that contain some previously deleted file's content, the paper's design writes content blocks before it commits the corresponding meta-data updates to the log. This only applies to write() system calls. The file content blocks that are written before committing are the "dependent" blocks. Q: Why was it taking so long for the previous file system to recover? A: The previous file system is ext2. ext2 has a checker program, called fsck, that inspects all the meta-data on the disk (i-nodes, directory content, free bitmaps) to ensure it is consistent. If not, fsck attempts to guess a reasonable fix. The biggest problem with ext2's fsck is that it is slow for big file systems -- it can take dozens of minutes or even hours to run on the biggest file systems. And during that time the file-system cannot be used. People want crash recovery in seconds, not hours. Another problem with ext2 and fsck is that fsck sometimes can't guess a good way to fix problems, and then has to ask for human input. Q: The paper talks about merging filesystem operations together (top of page 5). What does it mean by merging? A: It means including the updated blocks from multiple system calls in one transaction. Despite the word "merge", the file system doesn't have to merge updates from different system calls. The file system code uses locks to ensure that only one system call at a time modifies any given piece of file-system data. And there's only one copy of any given data in memory (in the disk block cache) at a time, so that merges are not needed. Q: What is the paper referring to with "the decision about when to commit the current compound transaction and start a new one is a policy decision which should be under user control"? A: The final ext3 design ended up not quite this ambitious, and by default just commits every five seconds. Q: Isn't it a problem that, because ext3 by default commits only every five seconds, a crash might result in the loss of up to five seconds of system calls? Even though those system calls returned with successful return values? A: It's certainly something to think about. Applications that need to ensure that updates are safe on disk can call fsync(fd), which is specified to wait (not return) until all previous writes to file descriptor fd would survive a crash. fsync() also triggers a commit of the currently open transaction. Databases, for example, use fsync(). Q: What does the paper mean by a log-structured filesystem? A: A log-structured file system has *only* a log -- instead of having a file system, with a log that describes recent updates, a log-structured file system has only the log, and reads must find what they need by looking in the log. https://dl.acm.org/doi/10.1145/121132.121137 https://en.wikipedia.org/wiki/Log-structured_file_system