tags:

views:

229

answers:

6

Essentially, I would like something that behaves similarly to:

cat file | grep -i keyword1 | grep -i keyword2 | grep -i keyword3

How can I do this with a bash script that takes a variable-length list of keyword arguments? The script should do a case-insensitive match of lines containing all keywords.

+1  A: 

I don't know if this is efficient, and I think this is ugly, also there might be some utility for that, but:

#!/bin/bash

unset keywords matchlist
keywords=("$@")

for kw in "${keywords[@]}"; do
matchlist="$matchlist /$kw/ &&"
done

matchlist="${matchlist% &&}"

# awk "$matchlist { print; }" < <(tr '[:upper:]' '[:lower:]' <file)
awk "$matchlist { print; }" file

And yes, it needs some robustness regarding special characters and stuff. It's just to show the idea.

TheBonsai
Thanks, that's the gist of it. But how would I make the keyword match case-insensitive (like grep -i)?
Siou
You catched me! I think GNU awk has something like an IGNORECASE variable, but I'm not sure.If it is an option, you can lowercase the inputfile when reading in and use lowercase keywords only: awk ... < <(tr '[:upper:]' '[:lower:]' <file)
TheBonsai
+1  A: 

Give this a try:

shopt -s nocasematch
keywords="keyword1|keyword2|keyword3"
while read line; do [[ $line =~ $keywords ]] && echo $line; done < file

Edit:

Here's a version that tests for all keywords being present, not just any:

keywords=(keyword1 keyword2 keyword3)    # or keywords=("$@")
qty=${#keywords[@]}
while read line
do
    count=0
    for keyword in "${keywords[@]}"
    do
        [[ "$line" =~ $keyword ]] && (( count++ ))
    done
    if (( count == qty ))
    then
        echo $line
    fi
 done < textlines
Dennis Williamson
Thanks, I haven't used shopt before--always nice to learn something new. This gives a case-insensitive match but it matches lines containing the any of the keywords. I only want to match lines that contain all the keywords.
Siou
A: 

you can use bash 4.0++

shopt -s nocasematch
while read -r line
do
    case "$line" in 
        *keyword1*) f=1;;&
        *keyword2*) g=1;;&
        *keyword3*) 
            [ "$f" -eq 1 ] && [ "$g" -eq 1 ] && echo $line;;
    esac
done < "file"
shopt -u nocasematch

or gawk

gawk '/keyword/&&/keyword2/&&/keyword3/' file
This requires Bash 4
Dennis Williamson
A: 

Found a way to do this with grep.

KEYWORDS=$@
MATCH_EXPR="cat file"
for keyword in ${KEYWORDS};
do
  MATCH_EXPR="${MATCH_EXPR} | grep -i ${keyword}"
done
eval ${MATCH_EXPR}
Siou
+2  A: 
Idelic
A: 

I'd do it in Perl.

For finding all lines that contain at least one of them:

perl -ne'print if /(keyword1|keyword2|keyword3)/i' file

For finding all lines that contain all of them:

perl -ne'print if /keyword1/i && /keyword2/i && /keyword3/i' file
Andy Lester