views:

861

answers:

3

I need to create a new file handle so that any write operations to that handle get written to disk immediately.

Extra info: The handle will be the inherited STDOUT of a child process, so I need any output from that process to immediately be written to disk.

Studying the CreateFile documentation, the FILE_FLAG_WRITE_THROUGH flag looked like exactly what I need:

Write operations will not go through any intermediate cache, they will go directly to disk.

I wrote a very basic test program and, well, it's not working. I used the flag on CreateFile then used WriteFile(myHandle,...) in a long loop, writing about 100MB of data in about 15 seconds. (I added some Sleep()'s). Then

I then set up a professional monitoring environment consisting of continuously hitting 'F5' in explorer. The results: the file stays at 0kB then jumps to 100MB about the time the test program ends.

Next thing I tried was to manually flush the file after each write, with FlushFileBuffers(myHandle). This makes the observed file size grow nice and steady, as expected.

My question is, then, shouldn't the FILE_FLAG_WRITE_THROUGH have done this without manually flushing the file? Am I missing something? In the 'real world' program, I can't flush the file, 'cause I don't have any control over the child process that's using it.

There's also the FILE_FLAG_NO_BUFFERING flag, that I can't be used for the same reason - no control over the process that's using the handle, so I can't manually align the writes as required by this flag.

EDIT: I have made a separate project specifically for watching how the size of the file changes. It uses the .NET FileSystemWatcher class. I also write less data - around 100kB in total.

Here's the output. Check out the seconds in the timestamps.

The 'builtin no-buffers' version:

25.11.2008 7:03:22 PM: 10230 bytes added.
25.11.2008 7:03:31 PM: 10240 bytes added.
25.11.2008 7:03:31 PM: 10240 bytes added.
25.11.2008 7:03:31 PM: 10240 bytes added.
25.11.2008 7:03:31 PM: 10200 bytes added.
25.11.2008 7:03:42 PM: 10240 bytes added.
25.11.2008 7:03:42 PM: 10240 bytes added.
25.11.2008 7:03:42 PM: 10240 bytes added.
25.11.2008 7:03:42 PM: 10240 bytes added.
25.11.2008 7:03:42 PM: 10190 bytes added.

... and the 'forced (manual) flush' version (FlushFileBuffers() is called every ~2.5 seconds):

25.11.2008 7:06:10 PM: 10230 bytes added.
25.11.2008 7:06:12 PM: 10230 bytes added.
25.11.2008 7:06:15 PM: 10230 bytes added.
25.11.2008 7:06:17 PM: 10230 bytes added.
25.11.2008 7:06:19 PM: 10230 bytes added.
25.11.2008 7:06:21 PM: 10230 bytes added.
25.11.2008 7:06:23 PM: 10230 bytes added.
25.11.2008 7:06:25 PM: 10230 bytes added.
25.11.2008 7:06:27 PM: 10230 bytes added.
25.11.2008 7:06:29 PM: 10230 bytes added.
+6  A: 

I've been bitten by this, too, in the context of crash logging.

FILE_FLAG_WRITE_THROUGH only guarantees that the data you're sending gets sent to the filesystem before WriteFile returns; it doesn't guarantee that it's actually sent to the physical device. So, for example, if you execute a ReadFile after a WriteFile on a handle with this flag, you're guaranteed that the read will return the bytes you wrote, whether it got the data from the filesystem cache or from the underlying device.

If you want to guarantee that the data has been written to the device, then you need FILE_FLAG_NO_BUFFERING, with all the attendant extra work. Those writes have to be aligned, for example, because the buffer is going all the way down to the device driver before returning.

The Knowledge Base has a terse but informative article on the difference.

In your case, if the parent process is going to outlive the child, then you can:

  1. Use the CreatePipe API to create an inheritable, anonymous pipe.
  2. Use CreateFile to create a file with FILE_FLAG_NO_BUFFERING set.
  3. Provide the writable handle of the pipe to the child as its STDOUT.
  4. In the parent process, read from the readable handle of the pipe into aligned buffers, and write them to the file.
Tim Lesher
Nice idea on how to sidestep the FILE_FLAG_NO_BUFFERING limitations! I'll just have to see how buffering works with CreatePipe(). Thanks!
Cristi Diaconescu
+1  A: 

Perhaps you could be satisfied enough with FlushFileBuffers:

Flushes the buffers of a specified file and causes all buffered data to be written to a file.

Typically the WriteFile and WriteFileEx functions write data to an internal buffer that the operating system writes to a disk or communication pipe on a regular basis. The FlushFileBuffers function writes all the buffered information for a specified file to the device or pipe.

They do warn that calling flush, to flush the buffers a lot, is inefficient - and it's better to just disable caching (i.e. Tim's answer):

Due to disk caching interactions within the system, the FlushFileBuffers function can be inefficient when used after every write to a disk drive device when many writes are being performed separately. If an application is performing multiple writes to disk and also needs to ensure critical data is written to persistent media, the application should use unbuffered I/O instead of frequently calling FlushFileBuffers. To open a file for unbuffered I/O, call the CreateFile function with the FILE_FLAG_NO_BUFFERING and FILE_FLAG_WRITE_THROUGH flags. This prevents the file contents from being cached and flushes the metadata to disk with each write. For more information, see CreateFile.

If it's not a high-performance situation, and you won't be flushing too frequently, then FlushFileBuffers might be sufficient (and easier).

Ian Boyd
+1 for the reference to flag 'FILE_FLAG_WRITE_THROUGH'. I already mention using FlushFileBuffers in the question - as a workaround for the straight-forward (not working) solution, and that's what I was trying to avoid.
Cristi Diaconescu
Sorry, I meant +1 for the reference to flag 'FILE_FLAG_NO_BUFFERING' (although that's been answered before)
Cristi Diaconescu
A: 

The size you're looking at in Explorer may not be entirely in-sync with what the file system knows about the file, so this isn't the best way to measure it. It just so happens that FlushFileBuffers will cause the file system to update the information that Explorer is looking at; closing it and reopening may end up doing the same thing as well.

Aside from the disk caching issues mentioned by others, write through is doing what you were hoping it is doing. It's just that doing a 'dir' in the directory may not show up-to-date information.

Answers suggesting that write-through only writes it "to the file system" are not quite right. It does write it into the file system cache, but it also sends the data down to the disk. Write-through might mean that a subsequent read is satisfied from the cache, but it doesn't mean that we skipped a step and aren't writing it to the disk. Read the article's summary very carefully. This is a confusing bit for just about everyone.

jrtipton