ansaurus

Question

Answer 1

+4 A:

-i (inplace)

sed -i 's/[\d128-\d255]//' FILENAME

Ivan Kruchkoff 2010-07-26 18:51:03

had to change it to sed -i 's/[\d128-\d255]//g' FILENAMEand it worked .. thanks

Sujit 2010-07-26 18:57:06

@Sujit: Note that `sed -i` still creates an intermediate file. It just does it behind the scenes.

Dennis Williamson 2010-07-26 19:57:47

@Dennis - then what would be the better solution?

Sujit 2010-07-26 20:43:25

@Sujit: There's not a better solution. I just wanted to point out that an intermediate file is still created. Sometimes that matters. I just didn't want you to be under the assumption that it was doing it *literally* in place.

Dennis Williamson 2010-07-26 21:22:11

Answer 2

+1 A:

A perl oneliner would do: perl -i.bak -pe 's/[^[:ascii:]]//g' <your file>

-i says that the file is going to be edited inplace, and the backup is going to be saved with extension .bak.

ssegvic 2010-07-26 18:52:58

Answer 3

A:

As an alternative to sed or perl you may consider to use ed(1) and POSIX character classes.

Note: ed(1) reads the entire file into memory to edit it in-place, so for really large files you should use sed -i ..., perl -i ...

# see:
# - http://wiki.bash-hackers.org/doku.php?id=howto:edit-ed
# - http://en.wikipedia.org/wiki/Regular_expression#POSIX_character_classes

# test
echo $'aaa \177 bbb \200 \214 ccc \254 ddd\r\n' > testfile
ed -s testfile <<< $',l' 
ed -s testfile <<< $'H\ng/[^[:graph:][:space:][:cntrl:]]/s///g\nwq'
ed -s testfile <<< $',l'

trevor 2010-07-28 13:05:26

ansaurus

tags:

views:

answers:

Remove non-ascii characters from csv

related questions