views:

134

answers:

5

I wrote a download library for my colleague. It writes downloaded data to files.

My colleagues found that the file stays small for a long time, even if 100 Mb data have been downloaded.

So they suggest that I should call flush() after every write() so it will not take up memory to buffer these data.

But I don't think 100 Mb of virtual memory is a lot and think maybe windows has its reason to buffer so much data.

What do you think about it?

+2  A: 

If it was me, I'd want to ensure that all data was persisted to a non-volatile location as soon as possible. I'd definitely flush the streams to make sure I didn't lose anything in the event of a power failure. You didn't specify if there was a need to access the data later on, but I assume there is, otherwise why would you want to store it? To answer the original question, though - it isn't "harmful" to the OS, but you do risk losing data.

ZombieSheep
+4  A: 

Well, first you should investigate / debug what is going on. The problem might be elsewhere; for example Windows Explorer might not refresh the file size fast enough.

That said, you are right, generally if the VM system of the OS decides to buffer stuff in RAM, it has a good reason to do so, and you should not normally interfere. If there is a lot of free memory, it makes sense to use it, after all.

sleske
What happens in the event of power failure? 100 MB downloaded data should be stored as quickly as possible... Why not store it?
Atmocreations
A: 

If there is a means to reduce the memory requirements with negligible performance impact, I'd prefer a less greedy version. I might need that memory for something more important, and 100Mb footprint is pretty huge for a downloader.

spender
+2  A: 

Flushing at some specific intervals/sizes/lines might be good rather than flushing for every write. It helps to reduce memory footprint and also make sure the actual file is updated with information periodically. For example, you could flush at every 100 lines.

Aviator
+4  A: 

I would trust the operating system to tune itself appropriately, personally.

As for "flush immediately so as not to lose data if power dies" - if the power dies half way through a file, would you trust that the data you'd written was okay and resume the download from there? If so, maybe it's worth flushing early - but I'd weigh the complexity of resuming against the relative rarity of power failures, and just close the file when I'd read everything. If you see a half written file, delete it and download it again from scratch.

Jon Skeet
So true - especially if the application is liable to have a shelf life of a few years and may run on different versions of an OS.
middaparka
and different hardware! Does anyone know the optimal flush period for SSDs? I don't. Leave it to the OS. Ok the OS may not get it right in V1 but should get better with some updates....if your hard code is suboptimal then you will have to handle the updates
Adrian