tags:

views:

426

answers:

4

I need to be able to read a file format that mixes binary and non-binary data. Assuming I know the input is good, what's the best way to do this? As an example, let's take a file that has a double as the first line, a newline (0x0D 0x0A) and then ten bytes of binary data afterward. I could, of course, calculate the position of the newline, then make a BinaryReader and seek to that position, but I keep thinking that there has to be a better way.

A: 

You would read the whole file as binary, and use System.Text.Encoding to read the strings from byte arrays. You need to know the format of the file, specifically how it encodes strings and specifies their length.

Jon B
+1  A: 

Is this file format already fixed? If it's not, it's a really good idea to change to use a length-prefixed format for the strings. Then you can read just the right amount and convert it to a string.

Otherwise, you'll need to read chunks from the file, scan for the newline, and decode the right amount of data or (if you don't find the newline) either buffer it somewhere else (e.g. a MemoryStream) or just remember the starting point and rewind the stream appropriately. It will be ugly, but that's just because of the deficiency of the file format.

I would suggest you don't "over-decode" (i.e. decode the arbitrary binary data after the string) - while it may well not do any harm, in some encodings you could be reading an impossible sequence of binary data, which then starts getting into the realms of DecoderFallbacks and the like.

Jon Skeet
I sure hope Jeff didn't use an Int16 for the rep calc...
Jon B
+1  A: 

You can use System.IO.BinaryReader. The problem with this though is you must know what type of data you are going to be reading before you call any of the Read methods.

Read(byte[], int, int)
Read(char[], int, int)
Read()
Read7BitEncodedInt()
ReadBoolean()
ReadByte()
ReadBytes(int)
ReadChar()
ReadChars()
ReadDecimal()
ReadDouble()
ReadInt16()
ReadInt32()
ReadInt64()
ReadSByte()
ReadSingle()
ReadString()
ReadUInt16()
ReadUInt32()
ReadUInt64()

And of course the same methods exist for writing in System.IO.BinaryWriter.

David Anderson
I can't believe I missed that. The field types are fixed, so that's perfect. I feel like such a moron now.
Lee Crabtree
A: 

I've had to deal with that when reading HTTP requests coming in over the wire on Compact Framework. My solution was to roll my own non-buffering ASCII-only StreamReader, so that it was safe to interleave calls to both the StreamReader and the underlying Stream.

Jeffrey Hantin