I came across a comment in some code referring to said code being "I18N safe".
What does this refer to?
I came across a comment in some code referring to said code being "I18N safe".
What does this refer to?
Internationalization. The derivation of it is "the letter I, eighteen letters, the letter N".
i18n means i**nternationalizatio**n => i (18 letters) n. Code that's marked as i18n safe would be code that correctly handles non-ASCII character data (e.g. Unicode).
I + (some 18 characters) + N = InternationalizatioN
I18N safe means that steps were taken during design and development that will facilitate Localization (L10N) at a later point.
Without any additional information, I would guess that it means the code handles text as UTF8 and is locale-aware. See this Wikipedia article for more information.
Can you be a bit more specific?
This is most often referred to a code or construct ready for I18N - i.e easily supported by common I18N techniques. For instance, the following is ready:
printf(loadResourceString("Result is %s"), result);
while the following is not:
printf("Result is " + result);
because the word order may vary in different languages. Unicode support, international date-time formatting and the like also qualify.
EDIT: added loadResourceString to make an example close to real life.
I18N stands for Internationalization.
In a nutshell: I18N safe code means that it uses some kind of a lookup table for texts on the UI. For this you have to support non-ASCII encodings. This might seem to be easy, but there are some gotchas.
i18n is a shorthand for "internationalization". This was coined at DEC and actually uses lowercase i and n.
As a sidenote: L10n stands for "localization" and uses capital L to distinguish it from the lowercase i.
i18n-safe is a vague concept. It generally refers to code that will work in international environments - with different locale, keyboard, character sets etc. True i18n-safe code is hard to write.
It means that code cannot rely on:
sizeof (char) == 1
because that character could be a UTF-32 4-byte character, or a UTF-16 2-byte character, and occupy multiple bytes.
It means that code cannot rely on the length of a string equalling the number of bytes in a string. It means that code cannot rely on zero bytes in a string indicating a nul terminator. It means that code cannot simply assume ASCII encoding of text files, strings, and inputs.