ansaurus

Question

How can I avoid encoding mixups of strings in a C/C++ API?

Answer 1

+3 A:

You could pass arround a std::pair instead of a char*:

struct utf8_tag_t{} utf8_tag;
std::pair<const char*,utf8_tag_t> getTranslatedWord(std::pair<const char*,utf8_tag_t> englishWord);

The generated machine code should be identical on a decent modern compiler that uses the empty base class optimization for std::pair.

I don't bother with this though. I'd just use char*s and document that the input has to be utf8. If the data could come from an untrusted source, you're going to have to check the encoding at runtime anyway.

Joe Gauterin 2010-05-21 10:56:44

+1 That's a pretty creative idea. :-)

Frerich Raabe 2010-05-21 13:17:44

+1 for 'don't bother'… Just use utf-8.

Steven R. Loomis 2010-05-24 17:46:23

Answer 2

+1 A:

I suggest that you use std::wstring.

Check out this other question for details .

radman 2010-05-21 11:35:43

Yes, std::wstring looks like a candidate. However, I was wondering whether there is maybe something which doesn't require people to link their plugins against the standard C++ library. At least with Visual Studio 2009 it's not all inline template magic as far as I can see.

Frerich Raabe 2010-05-21 13:23:45

Using std::wstring isn't a good idea. It's a sequence of wchar_t - which is a 16 bit integer type on Microsoft compilers and a 32 bit integer type on gcc. So a std::wstring could reasonably contain utf16LE, utf16BE, utf32BE or utf32LE.

Joe Gauterin 2010-05-21 14:11:32

Answer 3

A:

The ICU project provides a Unicode support library for C++.

jopa 2010-05-21 11:55:35

True, but I'd rather not pull in a whole new library.

Frerich Raabe 2010-05-21 13:17:07

Unless you need other functions it provides…

Steven R. Loomis 2010-05-24 17:43:50

ansaurus

tags:

views:

answers:

How can I avoid encoding mixups of strings in a C/C++ API?

related questions