LANG and sed on OSX | ansaurus

tags:

views:

20

answers:

1

+2 Q:

LANG and sed on OSX

In a recent question it was noted that on OSX running sed on a non ascii file gave strange results. For instance if you do (/usr/bin/cal is a random binary file)

sed 's/[^A-Z]//' /usr/bin/cal

sed will remove all of the printable characters other than A-Z, but many nonprintable characters remain. If however, you do

LANG='' sed 's/[^A-Z]//' /usr/bin/cal

only A-Z (and newlines) are output. Why?

Normally LANG=en-US.UTF-8 What is going on? I cannot see anyway that the output of sed could be considered correct in UTF-8. Is it broken, or is there some notion of working that I do not understand?

I know that the OSX sed is conforming to POSIX, and is therefore different from the beloved GNU sed.

+3 A:

Binary data, such as the contents of /usr/bin/cal, are not UTF-8, and so will confuse any code that reads it as if it was. In particular, any byte with the high bit set (e.g., >= 128) will be interpreted as part of a multi-byte sequence representing a single character, and will thus be elided from the output. Not all sequences of bytes with the high-bit set are valid UTF-8, so things will get quite confused, but this probably explains why some non-printable characters remain but (possibly) not others.

In short: if you want to use text-oriented tools on binary data, don't.

Marcelo Cantos 2010-08-08 23:48:13

related questions

Any pitfalls developing C#/.NET code in a Virtual Machine running on a MAC?

Height of NSTextView with one line?

What could prevent OpenGL glDrawPixels from working on some video cards?

What's a good machine for iPhone development?

MacPorts or Fink?

How to Stop NTFS volume auto-mounting on OS X

Any ReSharper equivalent for Xcode?

Rich GUI OS X Frameworks?

How do I open the default mail program with a Subject and Body in a cross-platform way?

Developer Setup for Starting Out with Cocoa/Mac Programming

Drawing a view hierachy into a specific context in Cocoa

Why is the PyObjC documentation so bad?

_wfopen equivalent under Mac OS X

Reading Other Process' Memory in Mac OS / BSD

better command for Windows?

Keep Remote Directory Up-to-date

How do I configure a Vista Ultimate (64bit) account so it can access a SMB share on OSX?

Programmatically talking to a Serial Port in OS X or Linux

I have some RAM to burn - any suggestions?

SQL Client for Mac OS X that works with MS SQL Server

How to tab between buttons on an Mac OS X dialog box

How to tab focus onto a dropdown field in Mac OSX

How-to articles for iPhone development, Objective-C

What are the preferred versions of Vim and Emacs on Mac OS X?

Best subversion client for Mac OS