ansaurus

Question

bash script to extract ALL matches of a regex pattern

Answer 1

A:

Use grep -o

-o, --only-matching show only the part of a line matching PATTERN

Daenyth 2010-09-04 18:20:43

Answer 2

+1 A:

Edit: answer to edited question:

for string in "$(echo $result | grep -Po "ADDNAME[0-9]{2}.*?HELLO")"
do
    match="${match:+$match }$string"
done

Original answer:

If you're using Bash version 3.2 or higher, you can use its regex matching.

string="string to search 99 with 88 some 42 numbers"
pattern="[0-9]{2}"
for word in $string
do
    [[ $word =~ $pattern ]]
    if [[ ${BASH_REMATCH[0]} ]]
    then
        match="${match:+match }${BASH_REMATCH[0]}"
    fi
done

The result will be "99 88 42".

Dennis Williamson 2010-09-04 18:37:25

I edited my post: My string does not have spaces, therefore it will not work.

bobby 2010-09-04 20:04:00

@bobby: see my edit.

Dennis Williamson 2010-09-04 20:42:24

Answer 3

A:

Not very elegant - and there are problems because of greedy matching - but this more or less works:

data="abcdefADDNAME25abcdefgHELLOabcdefgADDNAME25abcdefgHELLOabcdefg"
for word in $data \
    "ADDNAME25abcdefgHELLOabcdefgADDNAME25abcdefgHELLOabcdefg" \
    "ADDNAME25abcdefgHELLOabcdefgADDNAME25abcdefgHELLO"
do
    echo $word
done |
sed -e '/ADDNAME[0-9][0-9][a-z]*HELLO/{
        s/\(ADDNAME[0-9][0-9][a-z]*HELLO\)/ \1 /g
        }' |
while read line
do
    set -- $line
    for arg in "$@"
    do echo $arg
    done
done |
grep "ADDNAME[0-9][0-9][a-z]*HELLO"

The first loop echoes three lines of data - you'd probably replace that with cat or I/O redirection. The sed script uses a modified regex to put spaces around the patterns. The last loop breaks up the 'space separated words' into one 'word' per line. The final grep selects the lines you want.

The regex is modified with [a-z]* in place of the original .* because the pattern matching is greedy. If the data between ADDNAME and HELLO is unconstrained, then you need to think about using non-greedy regexes, which are available in Perl and probably Python and other modern scripting languages:

#!/bin/perl -w
while (<>)
{
    while (/(ADDNAME\d\d.*?HELLO)/g)
    {
        print "$1\n";
    }
}

This is a good demonstration of using the right too for the job.

Jonathan Leffler 2010-09-04 20:56:52

ansaurus

tags:

views:

answers:

bash script to extract ALL matches of a regex pattern

related questions