tags:

views:

143

answers:

3

I'm trying to write strictly binary data to files (no encoding). The problem is, when I hex dump the files, I'm noticing rather weird behavior. Using either one of the below methods to construct a file results in the same behavior. I even used the System::Text::Encoding::Default to test as well for the streams.

StreamWriter^ binWriter = gcnew StreamWriter(gcnew FileStream("test.bin",FileMode::Create));

(Also used this method)
FileStream^ tempBin = gcnew FileStream("test.bin",FileMode::Create);
BinaryWriter^ binWriter = gcnew BinaryWriter(tempBin);


binWriter->Write(0x80);
binWriter->Write(0x81);
.
.
binWriter->Write(0x8F);
binWriter->Write(0x90);
binWriter->Write(0x91);
.
.
binWriter->Write(0x9F);

Writing that sequence of bytes, I noticed the only bytes that weren't converted to 0x3F in the hex dump were 0x81,0x8D,0x90,0x9D ... and I have no idea why.

I also tried making character arrays, and the similar situation happens. i.e.,

array<wchar_t,1>^ OT_Random_Delta_Limits = {0x00,0x00,0x03,0x79,0x00,0x00,0x04,0x88};
binWriter->Write(OT_Random_Delta_Limits);

0x88 would be written as 0x3F.

Any ideas?

+3  A: 

If you want to stick to binary files then don't use StreamWriter. Just use a FileStream and Write/WriteByte. StreamWriters (and TextWriters in generally) are expressly designed for text. Whether you want an encoding or not, one will be applied - because when you're calling StreamWriter.Write, that's writing a char, not a byte.

Don't create arrays of wchar_t values either - again, those are for characters, i.e. text.

BinaryWriter.Write should have worked for you unless it was promoting the values to char in which case you'd have exactly the same problem.

By the way, without specifying any encoding, I'd expect you to get non-0x3F values, but instead the bytes representing the UTF-8 encoded values for those characters.

When you specified Encoding.Default, you'd have seen 0x3F for any Unicode values not in that encoding.

Anyway, the basic lesson is to stick to Stream when you want to deal with binary data rather than text.

EDIT: Okay, it would be something like:

public static void ConvertHex(TextReader input, Stream output)
{
    while (true)
    {
        int firstNybble = input.Read();
        if (firstNybble == -1)
        {
            return;
        }
        int secondNybble = input.Read();
        if (secondNybble == -1)
        {
            throw new IOException("Reader finished half way through a byte");
        }
        int value = (ParseNybble(firstNybble) << 4) + ParseNybble(secondNybble);
        output.WriteByte((byte) value);
    }
}

// value would actually be a char, but as we've got an int in the above code,
// it just makes things a bit easier
private static int ParseNybble(int value)
{
    if (value >= '0' && value <= '9') return value - '0';
    if (value >= 'A' && value <= 'F') return value - 'A' + 10;
    if (value >= 'a' && value <= 'f') return value - 'a' + 10;
    throw new ArgumentException("Invalid nybble: " + (char) value);
}

This is very inefficient in terms of buffering etc, but should get you started.

Jon Skeet
Part of the issue, though, is I'm reading a large file of text and extracting the bytes as I need. The use of StreamReader::ReadToEnd() is really, really handy.
Brett
If you're reading a large file of text then you're *not* dealing with bytes, you're dealing with *text*. You need to keep them very clearly separate in your head.
Jon Skeet
Yeah.. The problem with doing that is, I'm having issues with the compiler understanding what I'm trying to do. What I *really* want to do is parse the ASCII text and concatenate 2 consecutive characters to form a "byte", then write it's binary form, not it's ASCII equivalent. For example, I'll concatenate the strings "1" and "2", but when I convert and write as a byte, it'll write 0x0C instead of 0x12. The Convert::ToByte and WriteByte() methods don't like that, but I see no other way to do that. I can't seem to force the compiler to play by my rules.
Brett
Sorry, are you *trying* to write 0x12 or 0x0C? Are you basically trying to convert hex into binary? If so, I can write C# code to do that and let you figure out how to port it to C++ :)
Jon Skeet
I'm trying to write 0x12 *not* 0x0C. I suppose you could say I'm trying to convert hex to binary. In a nutshell, I'm parsing in hex strings from a file, concatenating consecutive characters, and transforming that to a byte. So when I read in "12", I don't want to write back "0x31 0x32" or "0x0C", I want the concatenated byte representation of "12"... 0x12. Sorry if my explaination is crappy.
Brett
That does indeed sound like just decoding a text file of hex into a binary file. I suggest you have a method which takes a TextReader (input) and a Stream (output). Let me know if you want me to write the method for you in C#. Oh, and will there be line breaks?
Jon Skeet
Yeah, if you'd like to write up some pseudocode that'd be awesome... Though the logic isn't slipping me, I've tried a lot of different methods, but I suppose they're technically incorrect. No line breaks, straight up raw and dirty binary.
Brett
Thanks for all the help, you've been helpful beyond all my expectations. I might have to join this community!
Brett
Update: I adjusted two segments of Tony's code posted above, which may or may have not been oversights on his part, but....1) The "ParseNybble" returns the var "value", but is defined with no return type. I changed to int and it appears to work as intended.2) For the line*if (value >= 'f' *I changed the first 'f' to 'a', otherwise the code won't handle lower-case ASCII characters correctly, and will return an Invalid Nybble exception.
Brett
@Brett: Yup, thanks, fixed in my post :)
Jon Skeet
A: 

0x3F is commonly known as the ASCII character '?'; the characters that are mapping to it are control characters with no printable representation. As Jon points out, use a binary stream rather than a text-oriented output mechanism for raw binary data.

EDIT -- actually your results look like the inverse of what I would expect. In the default code page 1252, the non-printable characters (i.e. ones likely to map to '?') in that range are 0x81, 0x8D, 0x8F, 0x90 and 0x9D

Steve Gilham
A: 

A BinaryWriter() class initialized with a stream will use a default encoding of UTF8 for any chars or strings that are written. I'm guessing that the

binWriter->Write(0x80);
binWriter->Write(0x81);
.
.
binWriter->Write(0x8F);
binWriter->Write(0x90);
binWriter->Write(0x91);

calls are binding to the Write( char) overload so they're going through the character encoder. I'm not very familiar with C++/CLI, but it seems to me that these calls should be binding to Write(Int32), which shouldn't have this problem (maybe your code is really calling Write() with a char variable that's set to the values in your example. That would account for this behavior).

Michael Burr