views:

199

answers:

3

Hi all,

Windows Win32 C++ question about flushing file activity to disk.

I have an external application (ran using CreateProcess) which does some file creation. i.e., when it returns it will have created a file with some content.

How can I ensure that the file the process created was really flushed to disk, before I proceed?

By this I mean not the C++ buffers but really flushing disk (e.g. FlushFileBuffers).

Remember that I don't have access to any file HANDLE - this is all of course hidden inside the external process.

I guess I could open up a handle of my own to the file and then use FlushFileBuffers, but it's not clear this would work (since my handle doesn't actually contain anything which needs flushing).

Finally, I want this to run in non-admin userspace so I cannot use FlushFileBuffers on a whole volume.

Any ideas?

John

UPDATE: Why do I think this is a problem?

I'm working on a data backup application. Essentially it has to create some files as described. It then has to update it's internal DB (using SQLite embedded DB).

I recently had a data corruption issue which occurred during a bluescreen (the cause of which was unrelated to my app).

What I'm concerned about is application integrity during a system crash. And yes, I do care about this because this app is a data backup app.

The use case I'm concerned about is this:

  1. A small data file is created using external process. This write is waiting in the OS cache to be written to disk.
  2. I update the DB and commit. This is a disk activity. This write is also waiting in the OS cache.
  3. A system failure occurs.

As I see it, we're now in a potential race condition. If "1" gets flushed and "2" doesn't then we're fine (as the DB transact wasn't then committed). If neither gets flushed or both get flushed then we're also OK.

As I understand it, the writes will be non-deterministic. i.e., I'm not aware that the OS will guarantee to write "1" before "2". (Am I wrong?)

So, if "2" gets flushed, but "1" doesn't then we have a problem.

What I observed was that the DB was correctly updated, but that the file had garbage in: the last 2 thirds of the data was binary "zeroes". Now, I don't know what it looks like when you have a file part flushed at the time of bluescreen, but I wouldn't be surprised if it looked like that.

Can I guarantee this is the cause? No I cannot guarantee this. I'm just speculating. It could just be that the file was "naturally" corrupted due to disk failure or as a result of the blue screen.

With regards to performance, this is something I believe I can deal with.

For example, the default behaviour of SQLite is to do a full file flush (using FlushFileBuffers) every time you commit a transaction. They are quite clear that if you don't do this then at the time of system crash, you might have a corrupted DB.

Also, I believe I can mitigate the performance hit by only flushing at "checkpoints". For example, writing 50 files, flushing the lot and then writing to the DB.

How likely is all this to be a problem? Beats me. But then my app might well be archiving at or around the time of system failure so it might be more likely that you think.

Hope that explains why I wan't to do this.

Thanks!

John

+3  A: 

Why would you want this? The OS will make sure that the data is flushed to the disk in due time. If you access it, it will either return the data from the cache or from disk, so this is transparent for you.

If you need some safety in case of disaster, then you must call FlushFileBuffers, for example by creating a process with admin rights after running the external process. But that can severely impact the performance of the whole machine.

Your only other option is to modify the source of the other process.

[EDIT] The most simple solution is probably to copy the file in your process and then flush the copy (since you have the handle). Save the copy under a name which says "not committed in the database".

Then update the database. Write into the database, "updated from file ...". If this entry already exists next time, don't update the database and skip this step.

Flush the database to disk.

Rename the file to "file has been processed into database". Rename is an atomic operation (so it either happens or not).

If you can't think of a good filename for the different states, then use subfolders and move the file between them.

Aaron Digulla
Agreed - the data doesn't need to go to disk, it will be read from cache. Once the external process is finished you can assume that the data is available.
Dipstick
Thanks - please see update which explains why I want to do this.
John
See my updated answer.
Aaron Digulla
RE the EDIT. Interesting idea about the "me do the copy" solution. I guess that will work. I'm not quite sure, though, why you need the subsequent steps after I manually copy the file. If I've done that and flushed then that solves it right then doesn't it? Anyway, good suggestion.
John
Since the process can crash at any time, you must be able to know in the next round what you've already done and what you still need to do. Make sure that it won't matter at which time the crash happens. This can include, for example, a BSOD during writing the DB files to disk (which might leave you with a broken DB).
Aaron Digulla
OK I see where you're coming from. I perhaps should have said that if the action isn't done then it will just be automagically retried later on by the app. Or to put it another way, if the action doesn't occur, I don't care as long as it doesn't leave anything in an inconsistent state. Good comments - much obliged.
John
+1  A: 

Well, there are no attractive options here. There is no documented way to retrieve the file handle you need from the process. Although there are undocumented ones, go there (via DuplicateHandle) only with careful consideration.

Yes, calling FlushFileBuffers on a volume handle is the documented way. You can avoid the privilege problem by letting a service make the call. Talk to it from your app with one of the standard process interop mechanisms. A named pipe whose name is prefixed with Global\ is probably the easiest way to get that going.

Hans Passant
A: 

After your update I think http://sqlite.org/atomiccommit.html gives you the answers you need.

The way SQLite ensures that everything is flushed to disc works. So it works for you as well - take a look at the source.

Tobias Langner
Thanks, but I'm not sure how this helps? I understand about how SQLite manages atomic commits. The issue isn't about whether/how the SQLite commit occurs, the issue is making sure that the previous action (completely unrelated to SQLite) has been truly "committed" (i.e. flushed to disk) before I even think about the DB.
John