tags:

views:

84

answers:

3

I want to remove all the non-ASCII characters from a file in place.

I found one solution with tr, but i guess I need to write back that file after modification.

I need to do it in place with relatively good performance.

Any suggestions?

Thanks

+4  A: 
-i (inplace)

sed -i 's/[\d128-\d255]//' FILENAME
Ivan Kruchkoff
had to change it to sed -i 's/[\d128-\d255]//g' FILENAMEand it worked .. thanks
Sujit
@Sujit: Note that `sed -i` still creates an intermediate file. It just does it behind the scenes.
Dennis Williamson
@Dennis - then what would be the better solution?
Sujit
@Sujit: There's not a better solution. I just wanted to point out that an intermediate file is still created. Sometimes that matters. I just didn't want you to be under the assumption that it was doing it *literally* in place.
Dennis Williamson
+1  A: 

A perl oneliner would do: perl -i.bak -pe 's/[^[:ascii:]]//g' <your file>

-i says that the file is going to be edited inplace, and the backup is going to be saved with extension .bak.

ssegvic
A: 

As an alternative to sed or perl you may consider to use ed(1) and POSIX character classes.

Note: ed(1) reads the entire file into memory to edit it in-place, so for really large files you should use sed -i ..., perl -i ...

# see:
# - http://wiki.bash-hackers.org/doku.php?id=howto:edit-ed
# - http://en.wikipedia.org/wiki/Regular_expression#POSIX_character_classes

# test
echo $'aaa \177 bbb \200 \214 ccc \254 ddd\r\n' > testfile
ed -s testfile <<< $',l' 
ed -s testfile <<< $'H\ng/[^[:graph:][:space:][:cntrl:]]/s///g\nwq'
ed -s testfile <<< $',l'
trevor