views:

42

answers:

4

I need to serialize some data in a binary format for efficiency (datalog where 10-100MB files are typical), and I'm working out the formatting details. I'm wondering if realistically I need to worry about file corruption / error correction / etc.

What are circumstances where file corruption can happen? Should I be building robustness to corruption into my binary format? Or should I wrap my nonrobust-to-corruption stream of bytes with some kind of error correcting code? (any suggestions? I'm using Java) Or should I just not worry about this?

edit: preliminary binary format, as I have it right now, contains a bunch of variable-length segments, so I am slightly worried that if I do ever have data corruption then upon reading it back, I could get out of sync, and cannot recover + I lose the rest of the file.

+2  A: 

You should at least add checksum. BER is good on modern hard drives, but this is not so for other media. Power loss during write usually corrupts file ends. If the data is important, you will need error correction codes, tripple and unbuffered writes, etc to commit transactions.

EXE do not have error correction, while single bit change can have drastic consequences.

If a file is to be transferred over TCP, you may assume zero errors.

Pavel Radzivilovsky
checksum where?
Jason S
checksum to every significant block. The smaller the block, the less data will be declared lost at a malfunction.
Pavel Radzivilovsky
+1  A: 

I have seen it happen once or twice that a file transferred over the Internet became corrupted. You can do error detection using a checksum, such as SHA256.

Sjoerd
checksum where?
Jason S
+1  A: 

You might be interested in the notes on error detecting codes in HDF5. Where and what kind of checksum depends on how you are accessing and updating the data as well as what is a useful chunk to detect an error in.

Pete Kirkham
A: 

I went with a Reed-Solomon encoding system. There's a fairly easy-to-use Java implementation of it in Java in the Google zxing library.

Jason S