ansaurus

Question

Is there a better way to convert to ASCII from an arbitrary input?

Answer 1

+1 A:

If you've got i_fileBytes in memory already, you can just check whether or not it starts with a BOM, and then convert either the whole of it or just the bit after the BOM using Encoding.Unicode.GetString. (Use the overload which lets you specify an index and length.)

So as code:

int start = (i_fileBytes[0] == 0xff && i_fileBytes[1] == 0xfe) ? 2 : 0;
string text = Encoding.Unicode.GetString(i_fileBytes, start, i_fileBytes.Length-start);

Note that that assumes a genuinely little-endian UTF-16 encoding, however. If you really need to detect the encoding first, you could either reimplement what StreamReader does, or perhaps just build a StreamReader from the first (say) 10 bytes, and use the CurrentEncoding property to work out what you should use for the encoding.

EDIT: Now, as for the conversion to ASCII - if you really only need it as a .NET string, then presumably all you want to do is replace any non-ASCII characters with "?" or something similar. (Alternatively it might be better to throw an exception... that's up to you, of course.)

EDIT: Note that when detecting the encoding, it would be a good idea to just call Read() a single time to read one character. Don't call ReadToEnd() as by picking 10 bytes as an arbitrary amount of data, it might end mid-character. I don't know offhand whether that would throw an exception, but it has no benefits anyway...

Jon Skeet 2008-11-21 18:57:57

Yeah, this is what I was considering and wanting to avoid. I can use Reflector to extract the BOM detection stuff from StreamReader. Not very clean and future-proof though.Using StreamReader to just grab the first 10 bytes is interesting though. Good idea!

Scott Bilas 2008-11-21 19:05:43

Answer 2

A:

System.Text.Encoding.ASCII.GetBytes(new StreamReader(new MemoryStream(i_fileBytes)).ReadToEnd())

That should save a few round-trips.

Joshua 2010-07-27 20:08:20

ansaurus

tags:

views:

answers:

Is there a better way to convert to ASCII from an arbitrary input?

related questions