How to convert a stream of bytes to another encoding?

It seems to me that you can just use CharNextExA to move to the next character position in the input stream. In the way you can get some characters and convert there together in the UNICODE string with respect of MultiByteToWideChar. After you have the UNICODE text fragment you can convert it in another code page using WideCharToMultiByte.

UPDATED: I am sure the process of receiving the stream of the input data is much more slowly as the decoding of data with respect of CharNextExA, MultiByteToWideChar and WideCharToMultiByte. For example if you use a buffer on the stack like WCHAR szBuffer[4096] and TCHAR szDestBuffer[4096] then you will be able to decode 1K of input data very quickly. So I suppose that the total time of working of your whole program will be almost indented from the usage of these three functions.

Moreover, I am not sure that you have any alternative. I don't know any reliable way to start decoding of the text either from the beginning of at the end of the text. Probably other people has another idea...

I need more efficient approach - the data chunks are very big and I don't want to call function for each symbol. Is there a way to reduce a number of calls?

Basilevs 2010-10-19 17:34:52

It seems to me that another way is impossible if you want support all codepages supported by Windows platforms. In the documentation of `IsDBCSLeadByteEx` you can read: "Lead byte values are specific to each distinct DBCS. Some byte values can appear in a single code page as both the lead and trail byte of a DBCS character. Thus, IsDBCSLeadByteEx can only indicate a potential lead byte value.". So The sequential scan of data with `CharNextExA` seems the only safe way. Just verify whether you will fill any performance changes from the usage of `CharNextExA`. It is quickly. `CharPrevExA` is slow

Oleg 2010-10-19 18:03:18

Is analysing a tail of 10 bytes at the end of 10000 bytes buffer with CharPrevExA() slower than processing the whole buffer with CharNextExA()? Will CharPrevExA work properly being given a middle of character as lpCurrentChar argument?

Basilevs 2010-10-20 05:21:46

@Basilevs: I wrote you before, that you should use `CharNextExA` because `CharPrevExA` is slow. Both `CharNextExA` and `CharPrevExA` will work **only if you start** with the correct character begin. So you **have to** use on from the function. Because of the performance reason you should use `CharNextExA`.

Oleg 2010-10-20 16:52:40

ansaurus

tags:

views:

answers:

How to convert a stream of bytes to another encoding?

related questions