views:

50

answers:

2

I am writing a program with special characters in it. Characters like װאבדג (hebrew).

Using some Ubuntu I had handy here I could get them to work inside the X environment (inside gnome-terminal). In rxvt, I get strange characters instead of what I have in the file; and in bare xterm I get some of them.

The file itself may be just as simple as

letters="⅄ႥႣႬזלבגװאבדגהוזחטענסףמלךלכפץצקႠႣႤႥႬႫႹჄႾႨ"
letters=$(echo $letters | sed -e 's/./\0\n/g')

letters=$(for i in $letters; do echo "$RANDOM$i" done | sort -rn | sed -e 's/[0-9]*//g')
echo $letters

In OS X it just shows "nnnnnnnnnnnnnnnnnnnn".

Within the tty without X.Org started, it just shows a diamond.

In all the terms, I have

LANG=es_ES.UTF-8

Is there any way to know within the script if the chars will be shown correctly (I could implement some fallback if so), or if we can set the terminal to show it.

+1  A: 

You have a bug here:

echo $letters | sed -e 's/./\0\n/g'

EDIT (Since you mention you are on OS X I removed the part talking about GNU Sed)

With the version of set built in to OS X, \0\n means "0n" (the character zero and the character n).

You are replacing every character in your input, so you should not be surprised that you are not seing them in the output.

mikerobi
Porculus
@Porculus, thanks for the explanation, I've corrected my response.
mikerobi
ssice
@mikerobi I am working on both, so you were right, there's a compatibility bug. I first tried it on Ubuntu
ssice
@ssice, I think you are just running in to limitations in the terminal. Everything works fine in gnome-terminal. There is a separate "rxvt-unicode", and apparently there are limitations in xterm: http://www.cl.cam.ac.uk/~mgk25/unicode.html#getxterm
mikerobi
@mikerobi But what about non-graphical tty?
ssice
@ssice, I have a suspicion that the result will be no better than xterm, and possibly worse. This is really outside of my area of expertise, but I suspect the problem is that the bitmap fonts used xterm, and non graphical consoles, just have incomplete character tables.
mikerobi
@mikerobi And how can I know about the character table inside the script, to implement a fallback?
ssice
+1  A: 

On Mac OS X you can check Terminal.app for UTF-8 readiness:

defaults read com.apple.Terminal StringEncoding  # 4
defaults read com.apple.Terminal DoubleWideChars  # YES

Furthermore, Mac OS X uses FreeBSD sed which does not accept \0.

printf "%s" "$letters" | sed $'s/./&\\\n/g'
printf "%s" "$letters" | gsed $'s/./&\\\n/g'
printf "%s" "$letters" | awk -vFS="" '{for(i=1;i<=NF;i++) print $i}'

# randomize letters
letters=$(echo $letters | sed $'s/./&\\\n/g')
# note the additional ";" after "${RANDOM}${i}"
letters=$(for i in $letters; do echo "${RANDOM}${i}"; done | sort -rn | sed -e 's/[0-9]*//g')
echo $letters
chad