views:

44

answers:

2

Hi guys

I have a folder full of binary files and I want to make a change to these files so that the hash of these files will change. I want to do this is a fashion that doesn't pertinently corrupt the files. Meaning that the change should still allow the file to operate normally or that I should be able to undo the change at any point in time.

Does anyone know of a script that I could use to do this or many a program that will automate this?

Cheers

UPDATE

Its a edge case that I am trying to deal with. I have a system that only allows me to store a file with a given hash once. Hence I am wanting to change the content hash of the file to allow the file to be stored. Note the system in question is not one I control or can change.

Couldn't I just add a random 1 to the end of the file and then remove it afterward without breaking anything? I'm just not sure how to script this - as in how to modify the binary data in this way. Note I'm in a windows environment.

+3  A: 

Without knowing the format of the files, we can't tell. It may in fact be impossible - for instance if these binary files are self-signed with some private key. Changing any single bit within the file is likely to render it invalid.

Is your hash calculated purely from the contents, and not any other metadata that you can change (such as filename or modified date)? If so, you're probably out of luck. If the hash is meant to detect when the content changes, but you're trying to change the hash without actually changing the content, you've clearly got a problem...

What is the hash used for? Why do you want to change it? There may be an alternative solution if you could give us more information about the bigger picture.

EDIT: One alternative is to effectively create your own container format - so while a file is stored in your container format, it's not usable in its original form, but it can be extracted easily. Your container could be as simple as "add four bytes at the end as a seed to disturb the hash" - "extracting" the file would just involve copying it and removing the last four bytes. But the important point is that what you end up with isn't an MP3 file or whatever you started with - it's your custom format, simple as it is. You need to package/extract the file any time you interact with the store.

Jon Skeet
If they are not self-signed... lets say like most media files, like mp3 or avi files? Would this change much if it was unsigned dlls or word docs?
webwalkerant
@webwalkerant: You'll need quite a lot of information about each and every binary file format you need to support - basically you'll need to find a bit of the file which you can modify without changing the contents *significantly*. Some may be easy, some hard, some impossible. I would personally look for an alternative approach rather than going down this road.
Jon Skeet
If they are MP3 files, then just change the ID3.
expedient
@jon Couldn't I just add a random 1 to the end of the file and then remove it afterward without breaking anything?
webwalkerant
@expedient is that because the ID3 tag is a part of the file content?
webwalkerant
@webwalkerant: Yes, you could add a 1 to the end of the file and then remove it afterwards... but there's no guarantee at all that it'll be a valid file of the relevant type while that extra data is there.
Jon Skeet
If you add and then remove a '1', you won't have actually changed any data, so the file's hash will be the same. That's just a painful way to run the `touch` command. ... If you mean to store the file with the '1' appended, how will you know which files to "fix" before you read them, and what happens when you want to store another copy of the file?
Karmastan
@jon how would I go about creating my own container format?
webwalkerant
@Karmastan I would record in the system which files where edited and which ones wouldn't... So if I want to do this, is the "touch" command what I want to use? Is there a windows equivalent? Is there a version that undoes the "touch"?
webwalkerant
@webwalkerant: `touch` just updates the last modified time of a file. It doesn't change the contents.
Jon Skeet
@jon ok so touch sounds like its not what i want... How would I go about creating the wrapper you mentioned?
webwalkerant
@webwalkerant: As I say, you just *always* assume that what's stored has an extra 4 bytes... so whenever you retrieve a file, remove the last 4 bytes, and whenever you store, add 4 on.
Jon Skeet
A: 

I'm really curious why you want to work around the dedeplication part of a storage system. Some engineer went to the trouble of building that into the system for a reason.

Karmastan