tags:

views:

95

answers:

3

I have a blacklist.txt file that contains keywords I want to remove using sed.

Here's what the blacklist.txt file contain

winston@linux ] $ cat blacklist.txt   
obscure
keywords
here
...

And here's what I have so far, but currently doesn't work.

  blacklist=$(cat blacklist.txt);
  output="filtered_file.txt"

  for i in $blacklist;
    do
      cat $input | sed 's/$i//g' >> $output
    done
+3  A: 

if you want to remove lines that contains words in that blacklist

grep -v -f blacklist.txt inputfile > filtered_file.txt

if you want to remove just the words alone

awk 'FNR==NR{
 blacklist[$0]
 next
}
{
 for(i=1;i<=NF;i++){
   if ($i in blacklist){
     $i=""
   }
 }
}1' blacklist inputfile > filtered_file.txt
ghostdog74
you are removing the whole line
J-16 SDiZ
only with the `grep` solution.
ghostdog74
+1 for the `awk` solution removing whole words as opposed to substrings too (and properly handling special e.g. regexp characters in the blacklisted words.) You should probably fix or remove the "just the shell" solution.
vladr
yes you are right. the shell solution is a different. i could probably "cut" the line up , but i'll just stick to awk.
ghostdog74
+1  A: 

You want to use sed twice: once on the blacklist to create a sed program that eliminates every line in blacklist, and then a second time applying that generated sed script to your real data.

First,

$ sed -e 's@^@s/@' -e 's@$@//g' < blacklist.txt > script.sed

If blacklist.txt looks like

word1
word2
....
wordN

then script.sed will look like

s/word1//g
s/word2//g
...
s/word3//g

You might find the use of @ characters above a bit confusing. The normal way of writing a sed substitute command is s/old/new/. This is quite awkward if either of old or new contain a forward slash. So, sed allows you to to use any character you want immediately after the substitute command. This means that you can write s@foo/bar@plugh/plover@ instead of s/foo\/bar/plugh\/plover/. I think you'll agree that the former is much easier to read.

Once you have script.sed generated, run

$ sed -f script.sed < file > censored-file

You can of course use the new-fangled (ie, less than 20 years old) -i option to do in-place editing.

Dale Hagglund
and he can, of course, do it all inline without an intermediate .sed file: `sed -e "```sed -e 's@^@s/@;s@\$@//g' <blacklist.txt | tr -s '[\n]' ';'```" <file >censored-file`
vladr
A: 

Wow, thanks for your quick answers! All of your solutions work as expected, really appreciated.

Winston