views:

911

answers:

4

Hi,

In order to make a binary comparer I'm trying to read in the binary contents of two files using the CreateFileW function. However, that causes the whole file to be bufferred into memory, and that becomes a problem for large (500MB) files.

I've looked around for other functions that'll let me just buffer part of the file instead, but I haven't found any documentation specifically stating how the buffer works for those functions (I'm a bit new at this so maybe I'm missing the obvious).

So far the best match I seem to have found is ReadFile. It seems to have a definable buffer but I'm not completely sure that there won't be another buffer implemented behind the scenes, like there is with CreateFileW.

Do you guys have any input on what would be a good function to use?

Thanks a bunch

+7  A: 

you could use memory mapped files to do this. open with createFile, use createFileMapping then MapViewOfFile to get a pointer to the data.

best regards, don

Don Dickinson
This is exactly what I needed, thanks!
Zain
+1  A: 

I believe you want MapViewOfFile.

Drew Hoskins
+4  A: 

Not sure what you mean by CreateFile buffering - CreateFile won't read in the entire contents of the file, and besides, you need to call CreateFile before you can call ReadFile.

ReadFile will do what you want - the OS may do some read ahead of data to opportunisticly cache data, but it will not read the entire 500 MB of file in.

If you really want to have no buffering, pass FILE_FLAG_NO_BUFFERING to CreateFile, and ensure that your file accesses are a multiple of volume sector size. I strongly suggest you do not do this - the system file cache exists for a reason and helps with performance. Caching files in memory should have no effect on the overall system's memory usage - under memory pressure the system file cache will shrink.

As others have mentioned, you can use memory mapped files as well. The difference between memory mapped files and ReadFile is mainly just the interface - ultimately the file manager will satisfy the requests in a similar manner, including some buffering. The interface appears to be a bit more intuitive, but be aware that any errors that occur will result in an exception that will need to be caught otherwise it will crash your program.

Michael
He may be worried about virtual memory - in a 32-bit address space there may not be enough room for his 500 MB files. The question of whether it's actually copied into RAM wouldn't be relevant.
Drew Hoskins
Right, but you don't have to read 500 MB at a time.
Michael
+4  A: 

Calling CreateFile() does not itself buffer or otherwise read the contents of the target file. After calling CreateFile(), you must call ReadFile() to obtain whatever parts of the file you want, for example to read the first kilobyte of a file:

DWORD cbRead;
BYTE buffer[1024];
HANDLE hFile = ::CreateFile(filename,
                            GENERIC_READ,
                            FILE_SHARE_READ,
                            NULL,
                            OPEN_EXISTING,
                            FILE_ATTRIBUTE_NORMAL,
                            NULL);
::ReadFile(hFile, sizeof(buffer), &cbRead, NULL);
::CloseHandle(hFile);

In addition, if you want to read a random portion of the file, you can use SetFilePointer() before calling ReadFile(), for example to read one kilobyte starting one megabyte into the file:

DWORD cbRead;
BYTE buffer[1024];
HANDLE hFile = ::CreateFile(filename,
                            GENERIC_READ,
                            FILE_SHARE_READ,
                            NULL,
                            OPEN_EXISTING,
                            FILE_ATTRIBUTE_NORMAL,
                            NULL);
::SetFilePointer(hFile, 1024 * 1024, NULL, FILE_BEGIN);
::ReadFile(hFile, sizeof(buffer), &cbRead, NULL);
::CloseHandle(hFile);

You may, of course, call SetFilePointer() and ReadFile() as many times as you wish while the file is open. A call to ReadFile() implicitly sets the file pointer to the byte immediately following the last byte read by ReadFile().

Additionally, you should read the documentation for the File Management Functions you use, and check the return values appropriately to trap any errors that might occur.

Windows may, at its discretion, use available system memory to cache the contents of open files, but data cached by this process will be discarded if the memory is needed by a running program (after all, the cached data can just be re-read from the disk if it is needed).

Matthew Xavier