views:

259

answers:

4

I've got lots of text that I need to output, which includes all sorts of characters from many languages. Sometimes I need to output the text in character encodings other than Unicode (eg, Shift-JIS, or ISO-8859-2), in order to match the page it's going to.

If the text has characters that the encoding can't handle (eg, Japanese characters in ISO-8859-2 encoded output) I end up with odd characters in the output. I can escape them, but I'd rather do that only if it's really necessary.

So, my question is this: Is there a way I can tell ahead of time if an encoding can handle all the characters in my string?

EDIT: I think the EncoderFallback is probably the right answer to the question I asked. Unfortunately it doesn't seem to work in my particular situation. My thought was to convert the characters to their HTML entity equivalents (eg, モ instead of モ). However, the encoder only converts the first such character it finds, and if I set the Response.ContentEncoding it never calls my EncoderFallback at all.

A: 

Convert it to the target encoding, convert it back and compare it with the original?

Try Encoding.GetBytes() and Encoding.GetStrings() to convert hence and forth.

As an optimization you could search all used unicode characters from your original string and just use that to try out the encoding.

froh42
+1  A: 

You can write your own EncoderFallback class assign that to the encoder before encoding.

Using this approach you need do nothing in advanced (which likely would be simply processing the output string looking for problems).

Instead your Fallback class need only handle replacements where the encoding does not have a value for a character.

AnthonyWJones
+1  A: 

Try to encode the string with an Encoding whose EncoderFallback is set to EncoderExceptionFallback. eg.:

Encoding e= Encoding.GetEncoding(932, new EncoderExceptionFallback(), new DecoderExceptionFallback());

Then catch EncoderFallbackException when you GetBytes().

bobince
A: 

I think the methods already should work. (The EncoderFallback solution seems quite nice.) Here's an alternative however, in case you prefer it.

Create an encoder for the encoding you want to test by calling encoding.GetEncoder(). You can then call the Convert method of the Encoder object, passing in your text, and looking at the value of the completed out parameter to determine whether it succeeded or not.

If speed is an issue, you may want to benchmark the various methods, but I suspect they would all have quite similar performance profiles.

Noldorin