tags:

views:

273

answers:

3

I'm trying to do some fairly simple string parsing in bash script. Basically, I have a file that is comprised of multiple multi-line fields. Each field is surrounded by a known header and footer.

I want to extract each field separately into an array or similar, like this

>FILE=`cat file`
>REGEX="@#@#@#[\s\S]+?@#@#@"
> 
>if [[$FILE =~ $REGEX ]] then
>   echo $BASH_REMATCH
>fi

FILE:

@#@#@#################################
this is field one
@#@#@#
@#@#@#################################
this is field two
they can be any number of lines
@#@#@#

Now I'm pretty sure the problem is that bash doesn't match newlines with the "."

I can match this with "pcregrep -M", but of course the whole file is going to match. Can I get one match at a time from pcregrep?

I'm not opposed to using some inline perl or similar.

Thanks in advance

A: 

I would build something around awk. Here is a first proof of concept:

awk '
    BEGIN{ f=0; fi="" }
    /^@#@#@#################################$/{ f=1 }
    /^@#@#@#$/{ f=0; print"Field:"fi; fi="" }
    { if(f==2)fi=fi"-"$0; if(f==1)f++ }
' file
mouviciel
A: 
begin="@#@#@#################################"
end="@#@#@#"
i=0
flag=0

while read -r line
do
    case $line in
        $begin)
            flag=1;;
        $end)
            ((i++))
            flag=0;;
        *)
            if [[ $flag == 1 ]]
            then
                array[i]+="$line"$'\n'    # retain the newline
            fi;;
     esac
done < datafile

If you want to keep the marker lines in the array elements, move the assignment statement (with its flag test) to the top of the while loop before the case.

Dennis Williamson
+2  A: 

if you have gawk

awk 'BEGIN{ RS="@#*#" }
NF{
    gsub("\n"," ") #remove this is you want to retain new lines
    print "-->"$0 
    # put to array
    arr[++d]=$0
} ' file

output

$ ./shell.sh
--> this is field one
--> this is field two they can be any number of lines
ghostdog74
Modified this a little bit to do what I want. Awk is something I've never learned. Thanks!
prestomation