Drawbacks with reading and writing a big file to or from disk at once instead of small chunks?

views:

answers:

+3 Q:

Drawbacks with reading and writing a big file to or from disk at once instead of small chunks?

I work mainly on Windows and Windows CE based systems, where CreateFile, ReadFile and WriteFile are the work horses, no matter if I'm in native Win32 land or in managed .Net land.

I have so far never had any obvious problem writing or reading big files in one chunk, as opposed to looping until several smaller chunks are processed. I usually delegate the IO work to a background thread that notifies me when it's done.

But looking at file IO tutorials or "textbook examples", I often find the "loop with small chunks" used without any explanation of why it's used instead of the more obvious (I dare to say!) "do it all at once".

Are there any drawbacks to the way I do that I haven't understood?

Clarification:

By big file I compared my single chunk with the multiple chunks. The multiple chunks examples I mentioned often have chunk sizes in the order 1024 bytes on Windows CE and 10 times it on the desktop. My big files are usually binary files like camera photos from mobile phones etc. and as such in the size order 2-10 MB. Not close to 1 GB, in other words.

+4 A:

In general, you shouldn't assume that a stream will read all the data in one go. While for local files it may be true, it may well not work for network files... and it definitely won't work for general network streams unless a higher level has already buffered them.

Then there's the matter of memory: suppose someone asks you to process a 3GB file. If you stream it, processing a chunk at a time, you've got no problems. If you try to read the whole thing into memory, you're unlikely to succeed...

In general: if you can stream it, do. Why would you want to use a less reliable and less efficient approach? For any sort of robustness you'd still have to check the return value of Read and compare it with how much you expected to read... so adding a loop doesn't cause very much complexity. Also, if you find yourself doing this a lot you may well spot patterns which you could encapsulate into helper methods, quite possibly taking delegates to represent the custom actions being taken for processing.

Jon Skeet 2010-04-25 20:10:17

@Jon: Great answer. Your last paragraph is spot on what I already do, with the exception that I've never actually saw the need to dive in to the loop branch of my helper methods. Also, please see the clarification - my **big files** are not big in the common sense... ;-)

Johann Gerell 2010-04-25 20:22:18

@Johann: If you're on a mobile device, a 10MB file is similar to a 10GB file on the desktop :) Gobbling up 10MB of memory unnecessarily on a mobile device could cause problems - although less so now than a few years ago, certainly.

Jon Skeet 2010-04-26 05:59:36

+3 A:

It depends on your definition of "big". Good luck reading a 10 GB file into memory if you have only 2 GB of RAM (not counting virtual memory).

So, speaking very generally, you'll always need to do chunking. This is probably why textbooks are so fond of it. It's just the size of the chunks that is a point of discussion.

An additional advantage of chunking, when you're processing a stream, is that memory usage is kept low, and independent of the size of the input.

However, if (and only if) you know that there's some upper bound to your file size, and a lower bound to your RAM, you can do it all at once.

Thomas 2010-04-25 20:11:10

@Thomas: Thanks! I usually **do have** a maximum possible upper bound of the data size, since (please see the clarification) my **big files** are not big in the common sense... ;-)

Johann Gerell 2010-04-25 20:24:55

Ohhh, in that case, by all means, read it all in one go! :D

Thomas 2010-04-26 10:14:14

ansaurus

tags:

views:

answers:

Drawbacks with reading and writing a big file to or from disk at once instead of small chunks?

related questions