views:

579

answers:

6

Hello! I'm writing an editor for large archive files (see below) of 4GB+, in native&managed C++.

For accessing the files, I'm using file mapping (see below) like any sane person. This is absolutely great for reading data, but a problem arises in actually editing the archive. File mapping does not allow resizing a file while it's being accessed, so I don't know how I should proceed when the user wants to insert new data in the file (which would exceed the file's original size, when it was mapped.)

Should I remap the whole thing every time? That's bound to be slow. However, I'd want to keep the editor real-time with exclusive file access, since that simplifies the programming a lot, and won't let the file get screwed by other applications while being modified. I wouldn't want to spend an eternity working on the editor; It's just a simple dev-tool for the actual project I'm working on.

So I'd like to hear how you've handled similar cases, and what other archiving software and especially other games do to solve this?

To clarify:

  • This is not a text file, I'm writing a specific binary archive file format. By which I mean a big file that contains many others, in directories. Custom archive files are very common in game usage for a number of reasons. With my format, I'm aiming to a similar (but somewhat simpler) structure as with Valve Software's GCF format - I would have used the GCF format as it is, but unfortunately no editor exists for the format, although there are many great implementations for reading them, like HLLib.

  • Accessing the file must be fast, as it is intended for storing game resources. So it's not a database. Database files would be contained inside it, along with GFX, SFX etc. files.

  • "File mapping" as talked here is a specific technique on the Windows platform, which allows direct access to a large file through creating "views" to parts of it, see here: http://msdn.microsoft.com/en-us/library/aa366556(VS.85).aspx - This technique allows minimal latency and memory usage and is a no-brainer for accessing any large files. So this does not mean reading the whole 4GB file into memory, it's exactly the contrary.

+2  A: 

What do you mean by 'editor software'? If this is a text file, have you tried existing production-quality editors, before writing your own? If it's a file storing binary data, have you considered using an RDBMS and manipulating its contents using SQL statements?

If you absolutely have to write this from scratch, I'm not sure that mmapping is the way to go. Mmapping a huge file will put a lot of pressure on your machine's VM system, and unless there are many editing operations all over the file its efficiency may lag behind a simple read/write scheme. Worse, as you say, you have problems when you want to extend the file.

Instead, maintain buffer windows to the file's data, which the user can modify. When the user decides to save the file, traverse sequentially the file and the edited buffers to create the new file image. If you have disk space it's easier to write a new file (especially if a buffer's size has changed), otherwise you need to be clever on how you read-ahead existing data, before you overwrite it with the new contents.

Alternatively, you can keep a journal of editing operations. When the user decides to save the file, perform a topological sort on the journal and play it on the existing file to create the new one.

For exclusive file access use the file locking of your operating system or implement application-level locking (if only your editor will touch these files). Depending on mmap for exclusive access constrains your implementation choices.

Diomidis Spinellis
+2  A: 

Mapping the file is create for actually accessing the data, but I think you need another abstraction that represents the structure of the file. There are various ways of doing this, but consider representing the file as a sequence of 'extents'.

To start with the file is a single extent that is equivalent to the whole mapping. If the user then starts to edit the file, you would split the single extent into two at the edit point, and insert a new extent that contains the data the user has inserted. Modifications and deletes would also modify your view of the file by creating or modifying these extents.

Maybe you could examine the source code for one of the open source editors -- there are lots to choose from, but finding one that is simple enough would be the challenge.

Rob Walker
Hi - yes, exactly - my problem is keeping things simple. I'm still hoping that there'd be a way to do this without having to track changes "virtually" and flush them separately later, but unfortunately that's starting to look like the only viable option.
psoul
+1  A: 

There's no easy answer for this problem -- I've looked for one for a long time, in vain. You'll have to modify the file's size, then re-map it.

Head Geek
+1  A: 

What I do is to close view handle(s) and FileMapping handle, set the file size then reopen mapping / view handles.

// Open memory mapped file    
HANDLE FileHandle = ::CreateFileW(file_name, GENERIC_READ | GENERIC_WRITE, 0, NULL, OPEN_EXISTING, 0, NULL);
size_t Size = ::GetFileSize(FileHandle, 0);
HANDLE MappingHandle = ::CreateFileMapping(FileHandle, NULL, PAGE_READWRITE, 0, Size, NULL);
void* ViewHandle = ::MapViewOfFile(MappingHandle, FILE_MAP_ALL_ACCESS, 0, 0, Size);

...

// increase size of file
UnmapViewOfFile(ViewHandle);
CloseHandle(MappingHandle);

Size += 1024;


LARGE_INTEGER offset;
offset.QuadPart = Size;

LARGE_INTEGER newpos;
SetFilePointerEx(FileHandle, offset, &newpos, FILE_BEGIN);
SetEndOfFile(FileHandle);

MappingHandle = ::CreateFileMapping(FileHandle, NULL, PAGE_READWRITE, 0, Size, NULL);
ViewHandle = ::MapViewOfFile(MappingHandle, FILE_MAP_ALL_ACCESS, 0, 0, Size);

The above code has no error checking and does not handle 64bit sizes, but that's not hard to fix.

Shane Powell
Yeah, this is exactly what I thought I'd have to do. Another thing is to optimize the speed on the expense of space: I thought of reserving a good chunk of space for the file the first time, and when closing the file, resizing it to the real EOF.
psoul
+1  A: 

Mapping has a basic issue with file on remote system.

In good old DOS days, there a was a fine editor called Norton Editor ( ne.com .. this the filename, not web site ). It can load file of any size ( we are talking of 640kb RAM and 20 GB hard disks, if any ).

It used to load only part of file, cleverly managing file-long searches with on demand loading

IMHO, such an approach should be used.

If properly hidden under a file-read-write layer , it can be surprisingly transparent.

Vardhan Varma
A: 

I'd build the large file from pieces at build-time. You have your editor deal with normal, flat files, in the usual file system (with subdirectories, etc., as appropriate). You then have a compile step that gathers all of these pieces together into your archive file format.

Roger Lipscombe