views:

106

answers:

4

Say I have a text "Բարև Hello Здравствуй". (I save this code in QString, but if you know other way to store this text in c++ code, you'r welcome.) How can I convert this text to Unicode escapes like this "\u1330\u1377\u1408\u1415 Hello \u1047\u1076\u1088\u1072\u1074\u1089\u1090\u1074\u1091\u1081" (see here)?

A: 

You have to first determine which Coding is used for the text "Բարև Hello Здравствуй", looks like Russian, may be Win Code Page 1251. OR UTF-8 or something else. Then Use window function MultiByteToWideChar with required inputs such as Applied Code page, OriginalName, etc.

Hope it helps.

Nains
Qt always uses UTF-16.
Philipp
Yup! just new to QT... Thanks!
Nains
+2  A: 

I assume you're doing code-generation (of JavaScript, maybe?)

QString is like a collection of QChar. Loop through the contents, and on each QChar call the unicode method to get the ushort (16-bit integer) value.

Then format each character like "\\u%04X", i.e. \u followed by the 4-digit hex value.

NB. You may need to swap the two bytes (the two hex characters) to get the right result depending on the platform you're running on.

Daniel Earwicker
Yes I work on RTF file generation :).
Narek
Not sure if this works outside the BMP. How are characters like U+10000 encoded in RTF files?
MSalters
The code is not readable, please use code formatting (simply edit your original question). And again it's nothing to do with UTF-8, so I'm quite sure you don't need the `setCodecForTr` stuff.
Philipp
Hi Narek, if you get an error message from your tools, it's a good idea to post the text of it so people can help you.
Daniel Earwicker
Also you appear to be trying to use my suggested format string by passing it as a string to a stream. That will just output it as a string. I wrote the format string in the old `sprintf` style, so you could try using that, or you could look up how to do it with `std::stringstream`.
Daniel Earwicker
+1  A: 
#include <cstdio>

#include <QtCore/QString>
#include <QtCore/QTextStream>

int main() {
  QString str = QString::fromWCharArray(L"Բարև Hello Здравствуй");
  QString escaped;
  escaped.reserve(6 * str.size());
  for (QString::const_iterator it = str.begin(); it != str.end(); ++it) {
    QChar ch = *it;
    ushort code = ch.unicode();
    if (code < 0x80) {
      escaped += ch;
    } else {
      escaped += "\\u";
      escaped += QString::number(code, 16).rightJustified(4, '0');
    }
  }
  QTextStream stream(stdout);
  stream << escaped << '\n';
}

Note this loops over UTF-16 code units, not actual code points.

Philipp
I am getting linkage error. It is because ch.unicode() I guess.
Narek
Link with `QtCore` (e.g. `g++ -lQtCore`).
Philipp
also QString::fromWCharArray causes the following problem:error: converting to execution character set: Illegal byte sequence
Narek
Did you test this code?
Narek
@Narek: Yes, I tested it before posting. (Sometimes I write code off my head, but this code would be way too complicated.)I also tested the result in Python, which uses the same escapement mechanism, and the input string was returned. Please specify your compiler and operating system. Mine is GCC 4.3.3 on Linux.
Philipp
QT Creator on Vista
Narek
Should nevertheless work. Maybe you need to specify the encoding of the source code explicitly to use the wide string.
Philipp
I do not know how to make this to work for me (anyway I have accomplished the task in represented way), but anyway this may be a variant for the other readrs, thus +1 and thanks Philip for your help!
Narek
A: 

I have solved the problem with this code:

EDITED TO A BETTER VERSION: (I just do not want to convert latin symbols to unicode, because it will consume aditional space without and advantage for my problem (want to remind that I whant to generate unicode RTF)).

int main(int argc, char *argv[])
{
    QApplication app(argc, argv);
    QTextCodec::setCodecForTr(QTextCodec::codecForName("UTF-8"));
    QString str(QWidget::tr("Բարև (1-2+3/15,69_) Hello {} [2.63] Здравствуй"));
    QString strNew;

    QString isAcsii;
    QString tmp;
    foreach(QChar cr, str)
    {
        if(cr.toAscii() != QChar(0))
        {
            isAcsii = static_cast<QString>(cr.toAscii());
            strNew+=isAcsii;
        }
        else
        {
            tmp.setNum(cr.unicode());
            tmp.prepend("\\u");
            strNew+=tmp;
        }
    }
    QMessageBox::about(0,"Unicode escapes!",strNew);
    return app.exec();
}

Thanks to @Daniel Earwicker for the algorithm and of course +1.

BTW you need to specify UTF-8 for text editor encoding.

Narek