views:

629

answers:

2

Hi. There are some standard tools to do this, but I need a simple GUI to assist some users (on windows). They will get an open file dialog and pick the file to process.

The file will be an XML file. The file will contain (within the first few lines) a text string that needs to be deleted or replaced with whitespace (doesn't matter which).

The problem is that the XML file is several gigabytes big but the fixed search and replace string will occur within the first 4k or so.

What's the best way to overwrite the search string and save in-place without requiring reading of whole amount into memory and or writing excessively to disk?

A: 

You can easily write your own tool. If it is in the very beginning, then any brute-force approch will work. Just keep on scanning until you find it.

However avoiding a lot of disk writes is only possible if you do not change file size. If you wish to delete or insert bytes somewhere in the middle, you will have to overwrite all that follows them. Which in your case would be practically all of the file. So you'll have to replace it with whitespace. As long as you just replace one byte with another, there will be no overhead.

Vilx-
+1  A: 

Obviously replacing with whitespace so the size of the file as a whole doesn't change is the best choice here, otherwise you must stream through the entire file to update in on disk.

If this was for a Unix environment, I would look into using mmap() to map a suitable part of the start of the file into RAM, then edit it in-place and be done.

This snippet shows how to use the Win32 equivalent, the CreateFileMapping() function.

unwind
Thanks. I guess the CFM() function allows mapping of just partial file into RAM and that the OS will handle the rest. I'll look into it.