views:

115

answers:

2

Hello!

I need to convert strings from one encoding (UTF-8) to another. The problem is that in the target encoding we do not have all characters from the source encoding and libc iconv(3) function fails in such situation. What I want is to be able to perform conversion but in output string have this problematic characters been replaced with some symbol, say '?'.

Programming language is C or C++.

Is there a way to address this issue ?

A: 

Regex based on the translatable source ranges used to swap a corresponding placeholder in for any chars that don't match.

Sugerman
+2  A: 

Try appending "//TRANSLIT" or "//IGNORE" to the end of the destination charset string. Note that this is only supported under the GNU C library.

From iconv_open(3):

   //TRANSLIT
          When the string "//TRANSLIT" is appended to tocode, translitera‐
          tion is activated.  This means that when a character  cannot  be
          represented  in the target character set, it can be approximated
          through one or several similarly looking characters.

   //IGNORE
          When the string "//IGNORE" is  appended  to  tocode,  characters
          that  cannot  be represented in the target character set will be
          silently discarded.

Alternately, manually skip a character and insert a substitution in the output when you get -EILSEQ from iconv(3).

bdonlan