In a recent question it was noted that on OSX running sed on a non ascii file gave strange results. For instance if you do (/usr/bin/cal is a random binary file)
sed 's/[^A-Z]//' /usr/bin/cal
sed
will remove all of the printable characters other than A-Z, but many nonprintable characters remain. If however, you do
LANG='' sed 's/[^A-Z]//' /usr/bin/cal
only A-Z (and newlines) are output. Why?
Normally LANG=en-US.UTF-8
What is going on? I cannot see anyway that the output of sed could be considered correct in UTF-8. Is it broken, or is there some notion of working that I do not understand?
I know that the OSX sed is conforming to POSIX, and is therefore different from the beloved GNU sed.