I want to remove all the non-ASCII characters from a file in place.
I found one solution with tr, but i guess I need to write back that file after modification.
I need to do it in place with relatively good performance.
Any suggestions?
Thanks
I want to remove all the non-ASCII characters from a file in place.
I found one solution with tr, but i guess I need to write back that file after modification.
I need to do it in place with relatively good performance.
Any suggestions?
Thanks
A perl oneliner would do: perl -i.bak -pe 's/[^[:ascii:]]//g' <your file>
-i
says that the file is going to be edited inplace, and the backup is going to be saved with extension .bak
.
As an alternative to sed or perl you may consider to use ed(1) and POSIX character classes.
Note: ed(1) reads the entire file into memory to edit it in-place, so for really large files you should use sed -i ..., perl -i ...
# see:
# - http://wiki.bash-hackers.org/doku.php?id=howto:edit-ed
# - http://en.wikipedia.org/wiki/Regular_expression#POSIX_character_classes
# test
echo $'aaa \177 bbb \200 \214 ccc \254 ddd\r\n' > testfile
ed -s testfile <<< $',l'
ed -s testfile <<< $'H\ng/[^[:graph:][:space:][:cntrl:]]/s///g\nwq'
ed -s testfile <<< $',l'