tags:

views:

777

answers:

7

I know it might seem ridiculous that you would purposely want to corrupt a file, but I assure you its for a good reason.

In my app, I have a lot of xml serialization going on. This in turn also means, I have a lot of deserialization.

Today I tried some disaster scenarios. I reset the server during a serialization operation, as expected it corrupted the xml file.

The problem is, trying to "shut down" the server at exactly the right time to corrupt the file is not really optimal, firstly its luck to catch the operation during its .0001 ms write time, and secondly the server then needs to reboot.Also its just a bad idea period to be pulling the plug from the server for other reasons.

Is there an app that can effectively corrupt a file, so that this file can be used for testing in my app?

+17  A: 

Open it up in a hex editor and have fun twiddling bits?

Amber
Didn't think of this. this could be one way to "break the file"
JL
Even easier, just create an empty file with the name that you need for testing.
EBGreen
Depends on what you mean by "Corruption" of course.
EBGreen
Empty and corrupt (might) be 2 different things. So I need to test with corrupt. I emphasize - Might...
JL
Its pretty clear, that a hex editor will have the power to corrupt the file... as for having fun twiddling bits - well its gotta be...
JL
Is that the same thing as corrupting the file though? If all you do is mess with the hex you're just changing the contents of the file. I thought a "corrupted" file was when the OS's file system data was corrupted for a particular file.
Spencer Ruport
Corrupted file can be anything where the file can not be used in the way it was originally intended. For example, if your download of an .exe is cut-off halfway through, the .exe will not run even though the first half of the data was correct, and it is considered corrupt.
Will Eddins
Since this is xml, if you don't have a hex editor handy a text editor should work just as well. Just randomly cut and paste things around.
Crappy Coding Guy
I agree with Dave Carlile; a hex editor is serious overkill for corrupting XML. For example, in ASCII encoding, twiddling bits will just turn, say, an "A" into an "@" or a box-drawing character, "┴". Just use a text editor.
WCWedin
+2  A: 

Are you attempting to test for a partially degraded file?

If you want to test how your program reacts to bad data, why not just use any random text file as input?

Dana
Want to try and emulate the same corruption that would occur during a real write operation that did not complete. So as authentic as possible.
JL
If you're using *nix, look into dd and "/dev/urandom" - can't get more random than that - e.g. dd if=/dev/urandom of=/my/random/file bs=1024 count=1024 -> 1 meg totally random file
Matt
Good solution for Linux, but this is windows based :)
JL
A: 

Agree with the Hex editor option, as this will allow you to introduce non-text values into the file, such as nulls (0x00), etc.

Andy
Just to help you out, the StackOverflow way to provide this input would be to add it as a comment to the other answer since this isn't really a separate answer at all.
EBGreen
Kinda hard to get rep with just comments, though.
Robert
Kinda hard to get rep when you're getting down-voted for rep farming, though. ;)
WCWedin
@EBGreen - Yes, fair point. Apologies.
Andy
+5  A: 

This is kind of the approach behind Fuzz Testing, i.e. introduce random variations and see how your application copes. You might look at some of the fuzz testing frameworks mentioned in the cited link. But in your case, it would be just as easy to use a random generator and insert bits in those positions to corrupt it. If you have a known case, then you can just use an existing corrupt file, of course.

ars
A: 

If you're trying to simulate an interrupted write, you might want to just truncate the string representing the serialized data. This would be especially easy if you're using unit tests, but still quite feasible with Notepad.

Of course, that's just one kind of bad data, but it's worth noting that XML that's malformed in any way is essentially no longer XML, and most parsers will reject it out-of-hand at the first sign of a syntax error.

WCWedin
+2  A: 

There are several ways of currupting an XML file. Thinking on some: - Incomplete XML tags (truncated XML). - Unexpected content on data (Binary / more text). For the first, I would copy a "correct/complete" XML file and would modify it by hand. For the second one I would concatenate a partial XML file with any binary file on the filesystem.

Hex editor seems a little too-much for me ;)

Edmundo
+1  A: 

I would highly recommend you dont do 'random byte' corruption for testing. Not only do you not know exactly what testing state you're doing, if you do find a bug you'll be hard pressed to guarantee that the next test will verify the fix.

My recommendation is to either manually (or programatically) corrupt the file in a predictable way so that you know what you're testing and how to reproduce the test if you must. (of course, you'll probably want multiple predictable ways to ensure protection against corruption anywhere in the file)

cyberconte
There are certain classes of problems that are best tested with random data – for example, when the nature of the operation results in a combinatorial explosion of valid input. This isn't one of those cases; +1. Not sure why this was down-voted.
WCWedin