views:

51

answers:

4
$ cat weirdo 
Lunch now?
$ cat weirdo | grep Lunch
$ vi weirdo
  ^@L^@u^@n^@c^@h^@ ^@n^@o^@w^@?^@

I have some files that contain text with some non-printing characters like ^@ which cause my greps to fail (as above).

How can I get my grep work? Is there some way that does not require altering the files?

A: 

You may have some success with the strings(1) tool like in:

strings file | grep Launch

See man strings for more details.

DarkDust
The `strings` command normally works on sequences of printable characters above a certain threshold - 4 by default. In the example shown, each printable is separated from the next by NUL, so strings won't find anything. I suppose that `strings -n 1` (or `-s 1` in some versions) might do the trick...except that each output string is normally separated from the next by a newline. So, you'd probably have to delete the newlines, which also makes things unreadable in a different way (the whole file always gets printed if it matches).
Jonathan Leffler
+1  A: 

you can try

awk '{gsub(/[^[:print:]]/,"") }1' file 
ghostdog74
+5  A: 

It looks like your file is encoded in UTF-16 rather than an 8-bit character set. The '^@' is a notation for ASCII NUL '\0', which usually spoils string matching.

One technique for loss-less handling of this would be to use a filter to convert UTF-16 to UTF-8, and then using grep on the output - hypothetically, if the command was 'utf16-utf8', you'd write:

utf16-utf8 weirdo | grep Lunch

As an appallingly crude approximation to 'utf16-utf8', you could consider:

tr -d '\0' < weirdo | grep Lunch

This deletes ASCII NUL characters from the input file and lets grep operate on the 'cleaned up' output. In theory, it might give you false positives; in practice, it probably won't.

Jonathan Leffler
I don't know about utf16-utf8, but `iconv` should be available everywhere: `iconv -f UTF-16 -t UTF-8 weirdo`
DarkDust
@DarkDust: thanks - `iconv` is a lot less hypothetical than `utf16-utf8`. Of course, as a shell script, `utf16-utf8` is now a trivial one-liner: `exec iconv -f UTF-16 -t UTF-8 "$@"`.
Jonathan Leffler
+3  A: 

The tr command is made for that:

cat weirdo | tr -cd '[:print:]\r\n\t' | grep Lunch
Pumbaa80