views:

427

answers:

5

I have a text file with various entries in it. Each entry is ended with line containing all asterisks.

I'd like to use shell commands to parse this file and assign each entry to a variable. How can I do this?

Here's an example input file:

***********
Field1
***********
Lorem ipsum
Data to match
***********
More data
Still more data
***********

Here is what my solution looks like so far:

#!/bin/bash
for error in `python example.py | sed -n '/.*/,/^\**$/p'`
do
    echo -e $error
    echo -e "\n"
done

However, this just assigns each word in the matched text to $error, rather than a whole block.

A: 

depending on what you want to do with the variables

awk '
f && /\*/{print "variable:"s;f=0}
/\*/{ f=1 ;s="";next}
f{
   s=s" "$0
}' file

output:

# ./test.sh
variable: Field1
variable: Lorem ipsum Data to match
variable: More data Still more data

the above just prints them out. if you want, store in array for later use...eg array[++d]=s

ghostdog74
+1  A: 

If you want to do it in Bash, you could do something like the following. It uses globbing instead of regexps (The extglob shell option enables extended pattern matching, so that we can match a line consisting only of asterisks.)

#!/bin/bash
shopt -s extglob
entry=""
while read line
do
    case $line in 
        +(\*))
            # do something with $entry here
            entry=""
            ;;
        *)
            entry="$entry$line
"
            ;;
    esac
done
Jukka Matilainen
A: 

Splitting records in (ba)sh is not so easy, but can be done using IFS to split on single characters (simply set IFS='*' before your for loop, but this generates multiple empty records and is problematic if any record contains a '*'). The obvious solution is to use perl or awk and use RS to split your records, since those tools provide better mechanisms for splitting records. A hybrid solution is to use perl to do the record splitting, and have perl call your bash function with the record you want. For example:

#!/bin/bash

foo() {
    echo record start:
    echo "$@"
    echo record end
}
export -f foo

perl -e "$/='********'; while(<>){chomp;system( \"foo '\$_'\" )}" << 'EOF'
this is a 2-line
record
********
the 2nd record
is 3 lines
long
********
a 3rd * record
EOF

This gives the following output:

record start:
this is a 2-line
record

record end
record start:

the 2nd record
is 3 lines
long

record end
record start:

a 3rd * record

record end
William Pursell
Note that the script given here almost certainly requires /bin/sh to be bash.
William Pursell
A: 

Try putting double quotes around the command.

#!/bin/bash
for error in "`python example.py | sed -n '/.*/,/^\**$/p'`"
do
    echo -e $error
    echo -e "\n"
done
Brad Gilbert
+1  A: 

I'm surprised to not see a native bash solution here. Yes, bash has regular expressions. You can find plenty of random documentation online, particularly if you include "bash_rematch" in your query, or just look at the man pages. Here's a silly example, taken from here and slightly modified, which prints the whole match, and each of the captured matches, for a regular expression.

if [[ $str =~ $regex ]]; then
    echo "$str matches"
    echo "matching substring: ${BASH_REMATCH[0]}"
    i=1
    n=${#BASH_REMATCH[*]}
    while [[ $i -lt $n ]]
    do
        echo "  capture[$i]: ${BASH_REMATCH[$i]}"
        let i++
    done
else
    echo "$str does not match"
fi

The important bit is that the extended test [[ ... ]] using its regex comparision =~ stores the entire match in ${BASH_REMATCH[0]} and the captured matches in ${BASH_REMATCH[i]}.

Jefromi