I have been following the discussion on the "bug" on EXT4 that causes files to be zeroed in crash if one uses the "create temp file, write temp file, rename temp to target file" process. POSIX says that unless fsync() is called, you cannot be sure the data has been flushed to harddisk.
Obviously doing:
0) get the file contents (read it or make it somehow)
1) open original file and truncate it
2) write new contents
3) close file
is not good even with fsync() as the computer can crash during 2) or fsync() and you end up with partially written file.
Usually it has been thought that this is pretty safe:
0) get the file contents (read it or make it somehow)
1) open temp file
2) write contents to temp file
3) close temp file
4) rename temp file to original file
Unfortunately it isn't. To make it safe on EXT4 you would need to do:
0) get the file contents (read it or make it somehow)
1) open temp file
2) write contents to temp file
3) fsync()
4) close temp file
5) rename temp file to original file
This would be safe and on crash you should either have the new file contents or old, never zeroed contents or partial contents. But if the application uses lots of files, fsync() after every write would be slow.
So my question is, how to modify multiple files efficiently on a system where fsync() is required to be sure that changes have been saved to disk? And I really mean modifying many files, as in thousands of files. Modifying two files and doing fsync() after each wouldn't be too bad, but fsync() does slow things down when modifying multiple files.
EDIT: changed the fsync() close temp file to corrent order, added emphasis on writing many many many files.