tags:

views:

80

answers:

1

According to the documentation of iconv_open() over: http://www.gnu.org/software/libiconv/documentation/libiconv/iconv_open.3.html

"//TRANSLIT" means that when a character cannot be represented in the target character set, it can be approximated through one or several characters.

and:

"//IGNORE" means that characters that cannot be represented in the target character set will be silently discarded.

But what is the default behavior, when neither are specified?

Thanks, Doori Bar

+1  A: 

The default behaviour is to stop conversion and return -1, with errno set to EILSEQ if an character that cannot be converted to the target character set is encountered.

(ie. This is different to both //TRANSLIT and //IGNORE).

caf
@caf: Thanks for clarifying it. I was wondering if it's possible to instruct iconv, to simply -copy- these bytes/characters to the output buffer?
DooriBar
@DooriBar: I don't believe so. Doing so would not in general be useful, since the bytes from the untranslatable character cannot be expected to carry any particular meaning in the destination character set, and might well not even represent a valid multibyte sequence.
caf
@caf: This is my scenario: my source is UTF-16LE, which -may- contain invalid multibyte sequences. I would like to convert this source to UTF-8, and whichever invalid bytes that can't be properly represented - transparently copied. In the future, if I'll need to restore the original UTF-16LE sequence, I want to be able to convert the UTF-8 form, to a UTF-16LE, while being identical. Is it at all possible? or I should stop trying?
DooriBar
@DooriBar: I would say that that's not really possible. The invalid sequences from the source could quite easily cause the UTF-8 to get out-of-sync or be handled incorrectly. Your safest bet is to transform it into correct UTF-8; if you may need the original corrupt data again later, save that separately.
caf
@caf: I see. I guess I could convert it to UTF-8, and whichever inputs that had difficulties, to store a copy of the input string for future use. Thanks a lot for clarifying it for me!
DooriBar