views:

70

answers:

2

I am opening files using memory map. The files are apparently too big (6GB on a 32-bit PC) to be mapped in one ago. So I am thinking of mapping part of it each time and adjusting the offsets in the next mapping.

Is there an optimal number of bytes for each mapping or is there a way to determine such a figure?

Thanks.

+1  A: 

There is no optimal size. With a 32-bit process, there is only 4 GB of address space total, and usually only 2 GB is available for user mode processes. This 2 GB is then fragmented by code and data from the exe and DLL's, heap allocations, thread stacks, and so on. Given this, you will probably not find more than 1 GB of contigous space to map a file into memory.

The optimal number depends on your app, but I would be concerned mapping more than 512 MB into a 32-bit process. Even with limiting yourself to 512 MB, you might run into some issues depending on your application. Alternatively, if you can go 64-bit there should be no issues mapping multiple gigabytes of a file into memory - you address space is so large this shouldn't cause any issues.

You could use an API like VirtualQuery to find the largest contigous space - but then your actually forcing out of memory errors to occur as you are removing large amounts of address space.

EDIT: I just realized my answer is Windows specific, but you didn't which platform you are discussing. I presume other platforms have similar limiting factors for memory-mapped files.

Michael
Thanks, Michael. I was thinking likewise, but I was also hoping that there's a way to determine the largest consecutive addresses available at run time, so that I could adjust the mapped bytes accordingly. Probably no such a way at all.My problem was on Windows, but I deliberately left that info out 'cause I wanted to know the answer for Linux as well, if any.
t.g.
You could use VirtualQuery to find contigous space (updated answer to contain this.)
Michael
A: 

Does the file need to be memory mapped?

I've edited 8gb video files on a 733Mhz PIII (not pleasant, but doable).

graham.reeds
it doesn't have to be. I am opening the file to calculate hashes and compare with another file bit by bit in case of hash collisions.I could open it with std::fstream, but I find memory mapping is much faster, well, when it can be mapped.how did you do it?
t.g.
We used the Borland Builder 5 file functions to get beyond the 4gb limit. This was back in 2002. Most of the time it was sequential, but there were times when we needed to pull clips from a larger file into another file. We preprocessed the file to generate an index so we could jump around quickly.
graham.reeds