ansaurus

Question

Can we simplify this string encoding code (C#)

Answer 1

+5 A:

Almost anything would be cleaner than this - you're really abusing text here, IMO. You're trying to represent effectively opaque binary data (the encoded text) as text data... so you'll potentially get things like bell characters, escapes etc.

The normal way of encoding opaque binary data in text is base64, so you could use:

return Convert.ToBase64String(Encoding.GetEncoding(936).GetBytes(text));

The resulting text will be entirely ASCII, which is much less likely to cause you hassle.

EDIT: If you need that output, I would strongly recommend that you represent it as a byte array instead of as a string... pass it around as a byte array from that point onwards, so you're not tempted to perform string operations on it.

Jon Skeet 2010-01-15 14:42:52

+1. I suspect that the OP's approach will not always be reversible. Meaning you'll be able to encode some data but not decode it correctly.

LBushkin 2010-01-15 14:52:36

The end encoding is required by a receipt printer I am sending data to.

Jason Kealey 2010-01-15 15:32:18

Answer 2

+5 A:

Well, for one, you don't need to convert the "built-in" string representation to a byte array before calling Encoding.Convert.

You could just do:

byte[] converted = Encoding.GetEncoding(936).GetBytes(text);

To then reconstruct a string from that byte array whereby the char values directly map to the bytes, you could do...

static string MangleTextForReceiptPrinter(string text) {
    return new string(
        Encoding.GetEncoding(936)
            .GetBytes(text)
            .Select(b => (char) b)
            .ToArray());
}

I wouldn't worry too much about efficiency; how many MB/sec are you going to print on a receipt printer anyhow?

Joe pointed out that there's an encoding that directly maps byte values 0-255 to code points, and it's age-old Latin1, which allows us to shorten the function to...

return Encoding.GetEncoding("Latin1").GetString(
           Encoding.GetEncoding(936).GetBytes(text)
       );

By the way, if this is a buggy windows-only API (which it is, by the looks of it), you might be dealing with codepage 1252 instead (which is almost identical). You might try reflector to see what it's doing with your System.String before it sends it over the wire.

Eamon Nerbonne 2010-01-15 14:54:10

See my update as to why I need that final format!

Jason Kealey 2010-01-15 15:34:27

Your code is good enough for me! Was wondering if there was a bult-in mangling function I was not aware of that would be more efficient than my loop. :)

Jason Kealey 2010-01-15 16:07:09

Answer 3

+2 A:

Does your receipt printer have an API that accepts a byte array rather than a string? If so you may be able to simplify the code to a single conversion, from a Unicode string to a byte array using the encoding used by the receipt printer.

Also, if you want to convert an array of bytes to a string whose character values correspond 1-1 to the values of the bytes, you can use the code page 28591 aka Latin1 aka ISO-8859-1.

I.e., the following

foreach (byte b in converted) 
    builder.Append((char)b); 

string result = builder.ToString();

can be replaced by:

// All three of the following are equivalent
// string result = Encoding.GetEncoding(28591).GetString(converted);
// string result = Encoding.GetEncoding("ISO-8859-1").GetString(converted);
string result = Encoding.GetEncoding("Latin1").GetString(converted);

Latin1 is a useful encoding when you want to encode binary data in a string, e.g. to send through a serial port.

Joe 2010-01-15 16:11:25

It doesn't, unfortunately. If it had, I wouldn't have spent as much time trying to understand its cryptic encoding scheme!

Jason Kealey 2010-01-15 16:20:22

Probably internally it is converting the unicode string back into a byte array for transmission to the printer, perhaps by using an encoding such as Latin1.

Joe 2010-01-15 16:25:29

Nice! I didn't know that converting to Latin-1 would replace my loop.

Jason Kealey 2010-01-15 16:29:57

ansaurus

tags:

views:

answers:

Can we simplify this string encoding code (C#)

related questions