Disclaimer: My apologies for all the text below (for a single simple question), but I sincerely think that every bit of information is relevant to the question. I'd be happy to learn otherwise. I can only hope that, if successful, the question(s) and the answers may help others in Unicode madness. Here goes.
I have read all the usually highly-regarded websites about utf8, particularly this one is very good for my purposes, but I've read the classics too, like those mentioned in other similar questions in SO. However, I still lack the knowledge about how to integrate it all in my virtual lab. I use Emacs with
;; Internationalization
(prefer-coding-system 'utf-8)
(setq locale-coding-system 'utf-8)
(set-terminal-coding-system 'utf-8)
(set-keyboard-coding-system 'utf-8)
(set-selection-coding-system 'utf-8)
in my .emacs, xterm started with
LC_CTYPE=en_US.UTF-8 xterm -geometry 91x58\
-fn '-Misc-Fixed-Medium-R-SemiCondensed--13-120-75-75-C-60-ISO10646-1'
and my locale reads:
LANG=en_US.UTF-8
LC_CTYPE=en_US.UTF-8
LC_NUMERIC="en_US.UTF-8"
LC_TIME="en_US.UTF-8"
LC_COLLATE="en_US.UTF-8"
LC_MONETARY="en_US.UTF-8"
LC_MESSAGES="en_US.UTF-8"
LC_PAPER="en_US.UTF-8"
LC_NAME="en_US.UTF-8"
LC_ADDRESS="en_US.UTF-8"
LC_TELEPHONE="en_US.UTF-8"
LC_MEASUREMENT="en_US.UTF-8"
LC_IDENTIFICATION="en_US.UTF-8"
LC_ALL=
My questions are the following (some of the answers may be the expected behavior of the application, but I still need to make sense of it, so bear with me):
Supposing the following C program:
#include <stdio.h>
int main(void) {
int c;
while((c=getc(stdin))!=EOF) {
if(c!='\n') {
printf("Character: %c, Integer: %d\n", c, c);
}
}
return 0;
}
If I run this in my xterm I get:
€
Character: � Integer: 226
Character: �, Integer: 130
Character: �, Integer: 172
(just in case the chars I get are a white question mark within a black circle). The ints are the decimal representation of the 3 bytes needed to encode €, but I am not exactly sure why xterm does not display them properly.
Instead, Mousepad, eg, prints
Character: â, Integer: 226
Character: ,, Integer: 130 (a comma, standing forU+0082 <control>, why?!)
Character: ¬, Integer: 172
Meanwhile, Emacs displays
Character: \342, Integer: 226
Character: \202, Integer: 130
Character: \254, Integer: 172
QUESTION: The most general question I can ask is: How do I get everything to print the same character? But I am certain there will be follow-ups.
Thanks again, and apologies for all the text.