mbcs

Converting MBCS stream to UTF-8 and vice versa in C++

Hi, I'm using Visual C++ (VS2005) and compiling the project in Multibyte Character Set (MBCS). However, the program needs to communicate with a webapp (which is in utf-8) via XMLRPC. So I'm thinking maybe I can use MBCS internally and convert the strings to utf-8 before sending them to the xmlrpc module and converting them back to MBCS ...

What multi-byte character set starts with 0x7F and is 4 bytes long?

Hi, I'm trying to get some legacy code to display Chinese characters properly. One character encoding I'm trying to work with starts with a 0x7F and is 4 bytes long (including the 0x7F byte). Does anyone know what kind of encoding this is and where I can find information for it? Thanks.. UPDATE: I've also had to work with some Japane...

Piecewise conversion of an MFC app to Unicode/MBCS

I have a large MFC application that I am extending to allow for multi-lingual input. At the moment I need to allow the user to enter Unicode data in edit boxes on a single dialog. Is there a way to do this without turning UNICODE or MBCS on for the entire application? I only need a small part of the application converted at the moment...

How to represent Unicode characters in an API

This is more an MBCS question than a Unicode question. I need to create an API that returns a list of structs that each instance holds a Unicode character as one of its members. This is in .NET so you'd think I'd want UTF-16, but then for Asian characters, there'd like be two characters required. What's the best practice when returnin...

tchar safe functions -- count parameter for UTF-8 constants

I'm porting a library from char to TCHAR. the count parameter of this fragment, according to MSDN, is the number of multibyte characters, not the number of bytes. so, did I get this right? My project properties in VC9 say 'use unicode character set' and I think that's correct, but I'm not how that impacts my count parameter. _tcsncmp(ac...

Why isn't UTF-8 allowed as the "ANSI" code page?

The Windows _setmbcp function allows any valid code page... (except UTF-7 and UTF-8, which are not supported) OK, not supporting UTF-7 makes sense: Characters have non-unique representations and that introduces complexity and security risks. But why not UTF-8? As I understand it, the "ANSI" versions of the Windows API functions...

Difference between MBCS and UTF-8 on Windows

I am reading about the charater set and encodings on Windows. I noticed that there are two compiler flags in Visual Studio compiler (for C++) called MBCS and UNICODE. What is the difference between them ? What I am not getting is how UTF-8 is conceptually different from a MBCS encoding ? Also, I found the following quote in MSDN: Uni...

How to know the preferred display width of Unicode characters?

In different encodings of Unicode, for example UTF-16le or UTF-8, a character may occupy 2 or 3 bytes. Many Unicode applications doesn't take care of display width of Unicode chars just like they are all Latin letters. For example, in 80-column text, which should contains 40 Chinese characters or 80 Latin letters in one line, but most ap...