tags:

views:

28

answers:

1

I need to edit some nasty binary files in proprietary format, so I wrote a converter between this binary format and XML. Now I can edit interesting bits, but unfortunately this format embeds a lot of raw binary data - I need to keep it where it is (or otherwise reinsert on conversion back), but it's not meaningfully editable anyway so I would like to see it as little as possible.

What's the simplest way to make such blobs take minimum amount of space, and while minimizing chance of such blob getting accidentally damaged? I'm thinking gzip+base64 and putting checksum and size in blob tag's attributes - or is there a more sensible way?

A: 

If the blobs can be reproduced easily from the original file, you can simply refer to them. Something like

<blob start="1000" end="2000"/>

or

<blob seq='1'/>

# in another file:
1 1000 2000

Update:

As the original files will be deleted (see comment), the above can't be used as is.

This would work:

<blob start='0' end='1000'/>

# Another file. Depending on space/time requirements, you may either
# not compress anything, compress the whole file, or compress each blob.
[blob 1][blob 2][blob 3]

If you absolutely require single-file output, you can also embed the second file in the XML (with encoding + checksum), but it's not much improvement over your original idea.

Johannes Sasongko
Unfortunately there's no good place - these blogs move around every time binary file gets modified, so I'd need to attach binary file to xml, which gets us back to the original question.
taw
@taw: Updated. <!-- padding to make SO happy -->
Johannes Sasongko