tags:

views:

60

answers:

4

I had problems with Umlauts in ASCII so I encode my Stream as UTF-8 now, which works, but it brings up a problem. I normally read 4 Bytes before ARTIST to determine the length of ARTIST=WHOEVER using

UTF8Encoding enc = new UTF8Encoding();
string response = enc.GetString(message, 0, bytesRead);
int posArtist = response.IndexOf("ARTIST");
BitConverter.ToInt32(message, posArtist - 4);

This works for ASCII perfectly.

The hex-editor examples are just to illustrate that reading the length doesn't work anymore like with ASCII

Here is an example-screenshot from a hex-editor: alt text

"ARTIST=M.A.N.D.Y. vs. Booka Shade" Length = 21

However that doesn't work for the UTF8-encoded stream. Here is a screenshot: alt text

"ARTIST=Paulseq" Length = E but in the picture its 2E.

What am I doing wrong here?

+3  A: 

It is an utter mystery how you got 21 out of the ASCII data. The shaded byte is in hex, its real value is 33. There's no way you can get 21 out of BitConverter.ToInt32, that requires bytes values (in hex) 15 00 00 00.

This must have worked by accident but no idea what that accident might look like. Post more code, including the code that writes this.

Hans Passant
I attached some more code.The hex-editor-shots are just to show you that reading the length like with ASCII-encoding doesn't work anymore. I know that these are hex-values.
Hedge
That didn't help. The ultimate flaw is that you expect binary data to be convertible to a string and back. That just doesn't work anymore in Unicode. It will fall victim to normalization and surrogates. You need to dramatically change this, BinaryWriter or BinaryFormatter. A low pain point that avoids hacking hex is XML serialization. Or a database, SQL Compact is nice.
Hans Passant
The length actually is 21h, 33 decimal.
Ben Voigt
+2  A: 

Only the strings should be UTF-8 encoded/decoded. If you're passing other (non-string) values in binary, the encoders they will destroy them.

Will
Oh that sounds bad.Now I'm randomly seeking for ARTIST in the buffer and parse the rest of the string by reading the length and then the rest.That would mean I have the parse all the stream correctly.
Hedge
+3  A: 

My guess is that you are mixing tools. That is a binary stream. It should be read with a BinaryReader and written with a BinaryWriter. When writing text, use Encoder.GetBytes to get the raw bytes to write, and when reading use Encoder.GetString on the raw bytes read. BinaryWriter/Reader have methods for values (like lengths) directly.

Tergiver
+4  A: 

your data is wrong - you actually have the character '\0' in the data where there should be binary zeroes

The problem lies in how you created this data, not in the reading of it

pm100
That's already encoded.When reading the lengths I read the original byte-Array (message).
Hedge
i have no idea what that means. For sure the dumped data you show cannot be fed into BitConverter. If you are trying to read other data that you havent shown then we cant help you
pm100
'\0' is a C# char literal that the compiler converts to *one* byte value, 00. The dump shows *two bytes*. Maybe the code that writes this has a "convert to C# source code" option, seems bizarre. But yes, it comes down to how it was actually written.
Hans Passant