Hello,
I've got a process which attempts to decode different encodings of strings from a binary stream. I get some behavior which does not quite add up in my mind when I step through it. Specifically, what I do is:
- obtain the maximum number of bytes which would be used to encode a character in the given encoding
- grab the amount of bytes from the stream
- use
Encoding.GetCharCount
to determine just how many characters might have been encoded in those bytes (could be 0 one or two...) - if its not zero i use
Encoding.GetString
to grab the characters out of the byte array - i then figure out how many bytes were used to encode the extracted characters and advance the stream index by that amount
- if the number of decodable bytes turns out to be zero i advance the index by one byte and try the whole thing again...in this fashion i expect not to miss any decodable characters
BTW, if anyone notices any incorrect assumption made in the above, feel free to say so...
I have my decoders set to throw DedcoderFallbackExceptions
when they cannot decode a given set of bytes. What confuses me is that some times the exception arises when I call GetCharCount
and other times it occurs when I call GetString
. Is there any reason this should be happening? Is this in fact expected? I would like to be able to reliably check for the presence of printable characters in as few places as possible - currently I'm doing it in several places.
Any thoughts?
thanks, brian
BIG UPDATE: It seems that my initial description of the problem is lacking a bit. Let me add a few more premises to the problem:
- the stream could be extremely large - it will not fit in memory for most users
- at any given place in the stream i don't know for sure that I am at the beginning of text, in the middle of text
- at any given place in the stream i don't know if i am in the middle or beginning of a multi byte character
- the stream will contain much material that is in fact not text of any sort, as well as a smattering of different encodings
Hopefully this clarifies some of the issues. Responses so far have been very helpful! Please do continue!