views:

603

answers:

9

I'm debugging some issues with writing pieces of an object to a file and I've gotten down to the base case of just opening the file and writing "TEST" in it. I'm doing this by something like:

static FileStream fs;
static BinaryWriter w;
fs = new FileStream(filename, FileMode.Create);
w = new BinaryWriter(fs);

w.Write("test");

w.Close();
fs.Close();

Unfortunately, this ends up prepending a box to the front of the file and it looks like so:

TEST, with a fun box on the front. Why is this, and how can I avoid it?

Edit: It does not seem to be displaying the box here, but it's the unicode character that looks like gibberish.

A: 

Sounds like byte order marks.

http://en.wikipedia.org/wiki/Byte-order%5Fmark

Perhaps you want to write the string as UTF-8.

Steven Sudit
See Henk's response - it's a length indicator, not byte order.
Jon B
So I gather. Based on the initial information, the BOM diagnosis was quite reasonable, though.
Steven Sudit
A: 

That's a byte order mark, most likely. It's because the stream's encoding is set to Unicode.

Kawa
+1  A: 

why are you using a binarywriter to write text? There is a separate textwriter for this

Toad
it won't be only text. i will be writing the entire data contents of files into this stream eventually.
Chris
+12  A: 

They are not byte-order marks but a length-prefix, according to MSDN:

public virtual void Write(string value);

Writes a length-prefixed string to [the] stream

And you will need that length-prefix if you ever want to read the string back from that point. See BinaryReader.ReadString().

Additional

Since it seems you actually want a File-Header checker

  1. Is it a problem? You read the length-prefix back so as a type-check on the File it works OK

  2. You can convert the string to a byte[] array, probably using Encoding.ASCII. But hen you have to either use a fixed (implied) length or... prefix it yourself. After reading the byte[] you can convert it to a string again.

  3. If you had a lot of text to write you could even attach a TextWriter to the same stream. But be careful, the Writers want to close their streams. I wouldn't advice this in general, but it is good to know. Here too you will have to mark a Point where the other reader can take over (fixed header works OK).

Henk Holterman
+2  A: 

As Henk pointed out in this answer, this is the length of the string (as a 32-bit int).

If you don't want this, you can either write "TEST" manually by writing the ASCII characters for each letter as bytes, or you could use:

System.Text.Encoding.UTF8.GetBytes("TEST")

And write the resulting array (which will NOT contain a length int)

Jon B
+2  A: 

That's because a BinaryWriter is writing the binary representation of the string, including the length of the string. If you were to write straight data (e.g. byte[], etc.) it won't include that length.

byte[] text = System.Text.Encoding.Unicode.GetBytes("test");
FileStream fs = new FileStream("C:\\test.txt", FileMode.Create);
BinaryWriter writer = new BinaryWriter(fs);
writer.Write(text);
writer.Close();

You'll notice that it doesn't include the length. If you're going to be writing textual data using the binary writer, you'll need to convert it first.

Joshua
A: 

Remember that Java strings are internally encoded in UTF-16.

So, "test" is actually made of the bytes 0xff, 0xfe (together the byte order mark), 0x74, 0x00, 0x65, 0x00, 0x73, 0x00, 0x74, 0x00.

You probably want to work with bytes instead of streams of characters.

Locoluis
Remember that a C# tag means: It's not Java.
Henk Holterman
+1  A: 

The byte at the start is the length of the string, it's written out as a variable-length integer.

If the string is 127 characters or less, the length will be stored as one byte. When the string hits 128 characters, the length is written out as 2, and it will move to 3 and 4 at some lengths as well.

The problem here is that you're using BinaryWriter, which writes out data that BinaryReader can read back in later. If you wish to write out in a custom format of your own, you must either drop writing strings like that, or drop using BinaryWriter altogether.

Lasse V. Karlsen
A: 

You can save it as a UTF8 encoded byte array like this:

...

BinaryWriter w = new BinaryWriter(fs);

w.Write(UTF8Encoding.Default.GetBytes("test"));

...
rocka