views:

694

answers:

4

I’m writing text to a binary file in C# and see a difference in quantity written between writing a string and a character array. I’m using System.IO.BinaryWriter and watching BinaryWriter.BaseStream.Length as the writes occur. These are my results:

using(BinaryWriter bw = new BinaryWriter(File.Open(“data.dat”), Encoding.ASCII))
{
  string value = “Foo”;

  // Writes 4 bytes
  bw.Write(value);

  // Writes 3 bytes 
  bw.Write(value.ToCharArray());
}

I don’t understand why the string overload writes 4 bytes when I’m writing only 3 ASCII characters. Can anyone explain this?

A: 

Did you look at what was actually written? I'd guess a null terminator.

Frank Schwieterman
+9  A: 

The documentation for BinaryWriter.Write(string) states that it writes a length-prefixed string to this stream. The overload for Write(char[]) has no such prefixing.

It would seem to me that the extra data is the length.

EDIT:

Just to be a bit more explicit, use Reflector. You will see that it has this piece of code in there as part of the Write(string) method:

this.Write7BitEncodedInt(byteCount);

It is a way to encode an integer using the least possible number of bytes. For short strings (that we would use day to day that are less than 128 characters), it can be represented using one byte. For longer strings, it starts to use more bytes.

Here is the code for that function just in case you are interested:

protected void Write7BitEncodedInt(int value)
{
    uint num = (uint) value;
    while (num >= 0x80)
    {
        this.Write((byte) (num | 0x80));
        num = num >> 7;
    }
    this.Write((byte) num);
}

After prefixing the the length using this encoding, it writes the bytes for the characters in the desired encoding.

Erich Mirabal
+1; the difference becomes even clearer when you look at how to **read** the data; with BinaryReader.ReadChars you need to tell it how many to read; with BinaryReader.ReadString it does this for you using the length prefix.
Marc Gravell
@Marc: excellent point. Those two are definitely written to complement each other and the Write(string) methods make a lot more sense in the context of "but now how do I read that data?"
Erich Mirabal
+3  A: 

From the BinaryWriter.Write(string) docs:

Writes a length-prefixed string to this stream in the current encoding of the BinaryWriter, and advances the current position of the stream in accordance with the encoding used and the specific characters being written to the stream.

This behavior is probably so that when reading the file back in using a BinaryReader the string can be identified. (e.g. 3Foo3Bar6Foobar can be parsed into the string "Foo", "Bar" and "Foobar" but FooBarFoobar could not be.) In fact, BinaryReader.ReadString uses exactly this information to read a string from a binary file.

From the BinaryWriter.Write(char[]) docs:

Writes a character array to the current stream and advances the current position of the stream in accordance with the Encoding used and the specific characters being written to the stream.

It is hard to overstate how comprehensive and useful the docs on MSDN are. Always check them first.

Jason
+1  A: 

As already stated, BinaryWriter.Write(String) writes the length of the string to the stream, before writing the string itself.

This allows the BinaryReader.ReadString() to know how long the string is.

using (BinaryReader br = new BinaryReader(File.OpenRead("data.dat")))
{
    string foo1 = br.ReadString();
    char[] foo2 = br.ReadChars(3);
}
Rob Elliott