Let's say you have a program, like a text editor or a word processor, that writes to user-created files. What steps should be taken to guarantee the minimum risk of data loss or corruption in the face of crashes, out-of-space errors, sudden power loss, race conditions, etc?
views:
64answers:
4Use SQLite
Well, OK, it is weird to use a DB for a text editor, but a word processor has so much of state that it might make some sense. Certainly it makes sense for as a storage format for many kinds of applications. There is a page on the SQLite wiki site about using it for undo/redo logs.
For a text editor you can use techniques that databases do: write ahead log, or rollback log, and good commit synchronization with the disk. Or you can store two versions of every file.
A good rule of thumb for safeguarding important data is
NEVER MODIFY THE ONLY COPY
In the case of Word processors and text editors, I believe it's standard to create a "shadow copy" (This might not be the technical term) which is a copy of the original file where all changes are made. Periodically (or when the user requests) you can force a save which contains modifications over the original file. The advantage of this is if there is a failure at any moment there is always at least one valid copy of the data.
The real goal is to achieve atomicity - an operation can only succeed or fail, never have an incomplete state. There are many other ways to attain atomicity aside from "shadow copies" , but this is how I believe text editors do it.
I wrote an earlier answer to a similar problem that applies here as well. The steps are:
- Write a temporary file with the new data
- Move the temporary file to a backup file in the original file's directory.
- Perform an atomic swap of the backup and original file (File.Replace in Windows or swapping inodes in Unix).
- Delete the backup (now original) file.
This is perhaps outdated with today's multi-gigabyte machines, but when developing on the mac I remember we used to allocate a memory block that would be large enough to perform a save operation.
If we ran out of memory, we could then give the user a warning that he/she was out of memory, then free that block so the actual save operation could take place.
Other features that are important to saving user data is to provide undo -- ideally unlimited undo/redo.