tags:

views:

162

answers:

1

I'm trying to convert a stream of bytes with MultiByteToWideChar() WinAPI fucntion.

Documentation says function fails with ERROR_NO_UNICODE_TRANSLATION on incomplete strings (no trailing byte in multibute encoded string). How do I prevent this error? The only way that comes to mind is not to convert the last multibyte character of input buffer (using IsDBCSLeadByteEx() to locate it).

Are there better solutions to convert a stream of bytes?

+2  A: 

It seems to me that you can just use CharNextExA to move to the next character position in the input stream. In the way you can get some characters and convert there together in the UNICODE string with respect of MultiByteToWideChar. After you have the UNICODE text fragment you can convert it in another code page using WideCharToMultiByte.

UPDATED: I am sure the process of receiving the stream of the input data is much more slowly as the decoding of data with respect of CharNextExA, MultiByteToWideChar and WideCharToMultiByte. For example if you use a buffer on the stack like WCHAR szBuffer[4096] and TCHAR szDestBuffer[4096] then you will be able to decode 1K of input data very quickly. So I suppose that the total time of working of your whole program will be almost indented from the usage of these three functions.

Moreover, I am not sure that you have any alternative. I don't know any reliable way to start decoding of the text either from the beginning of at the end of the text. Probably other people has another idea...

Oleg
I need more efficient approach - the data chunks are very big and I don't want to call function for each symbol. Is there a way to reduce a number of calls?
Basilevs
It seems to me that another way is impossible if you want support all codepages supported by Windows platforms. In the documentation of `IsDBCSLeadByteEx` you can read: "Lead byte values are specific to each distinct DBCS. Some byte values can appear in a single code page as both the lead and trail byte of a DBCS character. Thus, IsDBCSLeadByteEx can only indicate a potential lead byte value.". So The sequential scan of data with `CharNextExA` seems the only safe way. Just verify whether you will fill any performance changes from the usage of `CharNextExA`. It is quickly. `CharPrevExA` is slow
Oleg
Is analysing a tail of 10 bytes at the end of 10000 bytes buffer with CharPrevExA() slower than processing the whole buffer with CharNextExA()? Will CharPrevExA work properly being given a middle of character as lpCurrentChar argument?
Basilevs
@Basilevs: I wrote you before, that you should use `CharNextExA` because `CharPrevExA` is slow. Both `CharNextExA` and `CharPrevExA` will work **only if you start** with the correct character begin. So you **have to** use on from the function. Because of the performance reason you should use `CharNextExA`.
Oleg