I'm playing with bash, experiencing with utf-8 encoding. I'm new to unicode. The following command (well, their output) surprises me :
$ locale
LANG="fr_FR.UTF-8"
LC_COLLATE="fr_FR.UTF-8"
LC_CTYPE="fr_FR.UTF-8"
LC_MESSAGES="fr_FR.UTF-8"
LC_MONETARY="fr_FR.UTF-8"
LC_NUMERIC="fr_FR.UTF-8"
LC_TIME="fr_FR.UTF-8"
LC_ALL=
$ printf '1\né\n12\n123\n' | egrep '^(.|...)$'
1
é
12
$ touch 1 é 12 123
$ ls | egrep '^(.|...)$'
1
123
Ok. The two egrep filters lines with one or three characters. Their input is quite similar, but the output differs with the character é. Any explanation?
More details on my environment :
$ uname -a
Darwin macbook-pro-de-admin-6.local 10.4.0 Darwin Kernel Version 10.4.0: Fri Apr 23 18:28:53 PDT 2010; root:xnu-1504.7.4~1/RELEASE_I386 i386
$ egrep -V
egrep (GNU grep) 2.5.1Copyright 1988, 1992-1999, 2000, 2001 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.