tags:

views:

68

answers:

2

Hello, all the while, I am using setw for my ANSI text file alignment. Recently, I want to support UTF-8 in my text file. I found out that setw no longer work.

#include <windows.h>
#include <iostream>
// For StringCchLengthW.
#include <Strsafe.h>
#include <fstream>
#include <iomanip>
#include <string>
#include <cassert>

std::string wstring2string(const std::wstring& utf16_unicode) {
    //
    // Special case of NULL or empty input string
    //
    if ( (utf16_unicode.c_str() == NULL) || (*(utf16_unicode.c_str()) == L'\0') )
    {
        // Return empty string
        return "";
    }

    //
    // Consider WCHAR's count corresponding to total input string length,
    // including end-of-string (L'\0') character.
    //
    const size_t cchUTF16Max = INT_MAX - 1;
    size_t cchUTF16;
    HRESULT hr = ::StringCchLengthW( utf16_unicode.c_str(), cchUTF16Max, &cchUTF16 );

    if ( FAILED( hr ) )
    {
        throw std::exception("Error during wstring2string");
    }

    // Consider also terminating \0
    ++cchUTF16;

    //
    // WC_ERR_INVALID_CHARS flag is set to fail if invalid input character
    // is encountered.
    // This flag is supported on Windows Vista and later.
    // Don't use it on Windows XP and previous.
    //

    // CHEOK : Under Windows XP VC 2008, WINVER is 0x0600.
    // If I use dwConversionFlags = WC_ERR_INVALID_CHARS, runtime error will
    // occur with last error code (1004, Invalid flags.)
//#if (WINVER >= 0x0600)
//    DWORD dwConversionFlags = WC_ERR_INVALID_CHARS;
//#else
    DWORD dwConversionFlags = 0;
//#endif

    //
    // Get size of destination UTF-8 buffer, in CHAR's (= bytes)
    //
    int cbUTF8 = ::WideCharToMultiByte(
        CP_UTF8,                // convert to UTF-8
        dwConversionFlags,      // specify conversion behavior
        utf16_unicode.c_str(),  // source UTF-16 string
        static_cast<int>( cchUTF16 ),   // total source string length, in WCHAR's,
                                        // including end-of-string \0
        NULL,                   // unused - no conversion required in this step
        0,                      // request buffer size
        NULL, NULL              // unused
        );

    assert( cbUTF8 != 0 );

    if ( cbUTF8 == 0 )
    {
        throw std::exception("Error during wstring2string");
    }

    //
    // Allocate destination buffer for UTF-8 string
    //
    int cchUTF8 = cbUTF8; // sizeof(CHAR) = 1 byte
    CHAR * pszUTF8 = new CHAR[cchUTF8];

    //
    // Do the conversion from UTF-16 to UTF-8
    //
    int result = ::WideCharToMultiByte(
        CP_UTF8,                // convert to UTF-8
        dwConversionFlags,      // specify conversion behavior
        utf16_unicode.c_str(),  // source UTF-16 string
        static_cast<int>( cchUTF16 ),   // total source string length, in WCHAR's,
                                        // including end-of-string \0
        pszUTF8,                // destination buffer
        cbUTF8,                 // destination buffer size, in bytes
        NULL, NULL              // unused
        ); 

    assert( result != 0 );

    if ( result == 0 )
    {
        throw std::exception("Error during wstring2string");
    }

    std::string strUTF8(pszUTF8);

    delete[] pszUTF8;

    // Return resulting UTF-8 string
    return strUTF8;
}

int main() {
    // Write the file content in UTF-8
    {
        std::ofstream file;
        file.open("c:\\A-UTF8.txt");
        file << std::setw(12) << std::left << wstring2string(L"我爱你") << "????" << std::endl;
        file << std::setw(12) << std::left << "ILU" << "????";
    }

    {
        std::ofstream file;
        file.open("c:\\A-ANSI.txt");
        file << std::setw(12) << std::left << "WTF" << "????" << std::endl;
        file << std::setw(12) << std::left << "ILU" << "????";
    }
    return 0;
}

My output for A-ANSI.txt is

WTF         ????
ILU         ????

My out put for A-UTF8.txt is

我爱你   ????
ILU         ????

How can I make A-UTF8.txt's text aligned properly?

A: 

I'm not familiar with that, but I guess that's what happened: you still output 12 characters, but the first part of these characters takes less space as multiple characters are grouped into one unicode symbol. If that's the case, you can calculate the difference before the cout statements and pass it along to the setw. Good luck.

ahmadabdolkader
+1  A: 

Even in a "monospaced" font, some East Asian characters are wider than others. You also have to consider combining characters, which have no width of their own.

There's a wcswidth function that may do what you want.

dan04
wcswidth is not found in Windows.
Yan Cheng CHEOK