views:

437

answers:

4

I have a text file containing unwanted null characters. When I try to view it in I see ^@ symbols, interleaved in normal text. How can I:

a) Identify which lines in the file contains null characters? I have tried grepping for \0 and \x0, but this did not work.

b) Remove the null characters? Running strings on the file cleaned it up, but I'm just wondering if this is the best way?

Thanks

+2  A: 

I'd use "tr":

tr < file-with-nulls -d '\000' > file-without-nulls

except ... for reasons that escape me right now that is not working when I test it - I need to check to see if I'm in an alternate universe but I'll be back

edit nope I'm still here on Earth and dumb as ever - fixed the command so that the -d argument comes before the '\000' argument

Pointy
and a "diff file-with-nulls file-without-nulls" should show me which lines had null characters? It brings back a lot more than expected.
dogbane
Actually, I believe it should be `tr -d '\000' < file-with-nulls > file-without-nulls` since `<` is part of the shell pipe functionality and not `tr`.
Mikael S
pra
+1  A: 

A large number of unwanted NUL characters, say one every other byte, indicates that the file is encoded in UTF-16 and that you should use iconv to convert it to UTF-8.

Ignacio Vazquez-Abrams
good point - we don't know where his file came from
Pointy
I ran out of disk space while my application was logging. This resulting in these characters.
dogbane
+1  A: 

Use the following sed command for removing the null characters in a file.

sed -i 's/\x0//g' null.txt
rekha_sri
+1  A: 

I discovered the following, which prints out which lines, if any, have null characters:

perl -ne '/\000/ and print;' file-with-nulls

Also, an octal dump can tell you if there are nulls:

od file-with-nulls | grep ' 000'
dogbane