views:

30

answers:

2

I have a custom file type that is implemented in sections with a header at the shows the offset and length of each section within the file.

Currently, whenever I want to interact with the file, I must either load and parse the entire thing up front, or else pick only the sections that I need and load just them.

What I would like to do is to achieve a hybrid approach where each of the sections is loaded on-demand.

It seems however that doing this has a lot of potential downsides in terms of leaving filesystem handles open for longer than I would like and the additional code complexity that I would incur.

Are there any standard patterns for this sort of thing? It seems that my options are to:

  1. Just load the entire file and stop grousing about the cycles/memory wasted
  2. Load the entire file into memory as raw bytes and then satisfy any requests for unloaded sections from the memory buffer rather than disk. This saves me the cost of parsing the unneeded sections and requires less memory (since the disk representation is much more compact than the object model around it), but still means that I waste memory for sections that I never end up loading.
  3. Load whatever sections I need right away and close the file but hold onto the source location of the file. Then if another section is requested, re-open the file and load the data. In this case I could get strange results if the underlying file is changed.
  4. Same as the above but leave a file handle open (perhaps allowing read sharing).
  5. Load the file using Memory-Mapped IO and leave a view on the file open.

Any thoughts

+1  A: 

If possible, MMAP-ing the whole file is usually the easiest thing to do if you have a random-access pattern. This way you just delegate the loading/unloading issue to the OS and you have 1 & 2 for free.

If you have very special access patterns, you can even use something like fadvise() (I don't the exact Win32 equivalent) to tell the OS your access intend.

If your file is more than 2GB and you can either go the 64bits way or to mmap() the file on demand.

Steve Schnepp
+1  A: 

If the file is relatively small, mmap-ing the entire file is good enough. If the file is large, you could leave a mmap view open, and just move it around the file and resize it to view each section when needed.

Remy Lebeau - TeamB
*move it around*, I didn't know that trick. Do you have some infos to tame my curiosity ? :-)
Steve Schnepp
Well, technically you can't move an existing view around. But you can unmap the old view and re-map a new view in a different section of the same mmap. You can even have multple views of the same mmap active at one time. I use this kind of technique to scroll through data of multi-megabyte (sometimes multi-gigabyte) log files, and it works very well and very quickly.
Remy Lebeau - TeamB