tags:

views:

42

answers:

3

I have a caret delimited (key=value) input and would like to extract multiple tokens of interest from it.

For example: Given the following input

$ echo -e "1=A00^35=D^150=1^33=1\n1=B000^35=D^150=2^33=2"
1=A00^35=D^22=101^150=1^33=1
1=B000^35=D^22=101^150=2^33=2    

I would like the following output

35=D^150=1^
35=D^150=2^

I have tried the following

$ echo -e "1=A00^35=D^150=1^33=1\n1=B000^35=D^150=2^33=2"|egrep -o "35=[^/^]*\^|150=[^/^]*\^"
35=D^
150=1^
35=D^
150=2^

My problem is that egrep returns each match on a separate line. Is it possible to get one line of output for one line of input? Please note that due to the constraints of the larger script, I cannot simply do a blind replace of all the \n characters in the output.

Thank you for any suggestions.This script is for bash 3.2.25. Any egrep alternatives are welcome. Please note that the tokens of interest (35 and 150) may change and I am already generating the egrep pattern in the script. Hence a one liner (if possible) would be great

A: 

To get rid of the newline, you can just echo it again:

$ echo $(echo "1=A00^35=D^150=1^33=1"|egrep -o "35=[^/^]*\^|150=[^/^]*\^")
35=D^ 150=1^

If that's not satisfactory (I think it may give you one line for the whole input file), you can use awk:

pax> echo '
1=A00^35=D^150=1^33=1
1=a00^35=d^157=11^33=11
' | awk -vLIST=35,150 -F^ ' {
        sep = "";
        split (LIST, srch, ",");
        for (i = 1; i <= NF; i++) {
            for (idx in srch) {
                split ($i, arr, "=");
                if (arr[1] == srch[idx]) {
                    printf sep "" arr[1] "=" arr[2];
                    sep = "^";
                }
            }
        }
        if (sep != "") {
            print sep;
        }
    }'
35=D^150=1^
35=d^

 

pax> echo '
1=A00^35=D^150=1^33=1
1=a00^35=d^157=11^33=11
' | awk -vLIST=1,33 -F^ ' {
        sep = "";
        split (LIST, srch, ",");
        for (i = 1; i <= NF; i++) {
            for (idx in srch) {
                split ($i, arr, "=");
                if (arr[1] == srch[idx]) {
                    printf sep "" arr[1] "=" arr[2];
                    sep = "^";
                }
            }
        }
        if (sep != "") {
            print sep;
        }
    }'
1=A00^33=1^
1=a00^33=11^

This one allows you to use a single awk script and all you need to do is to provide a comma-separated list of keys to print out.


And here's the one-liner version :-)

echo '1=A00^35=D^150=1^33=1
      1=a00^35=d^157=11^33=11
      ' | awk -vLST=1,33 -F^ '{s="";split(LST,k,",");for(i=1;i<=NF;i++){for(j in k){split($i,arr,"=");if(arr[1]==k[j]){printf s""arr[1]"="arr[2];s="^";}}}if(s!=""){print s;}}'
paxdiablo
Thanks for your reply Pax. I have edited the question to better depict my problem. The awk solution would be awesome, except that I will not always want 35 and 150. I am already generating the egrep regex and generating the entire awk statement seems a bit brute force.
Dave
@Dave, see the update. The script itself doesn't change since you now just provide a list of tokens of interest. The only thing you need to generate dynamically is the `-vLIST=` bit.
paxdiablo
Thanks a million!
Dave
+1  A: 

You have two options. Option 1 is to change the "white space character" and use set --:

OFS=$IFS
IFS="^ "
set -- 1=A00^35=D^150=1^33=1  # No quotes here!!
IFS="$OFS"

Now you have your values in $1, $2, etc.

Or you can use an array:

tmp=$(echo "1=A00^35=D^150=1^33=1" | sed -e 's:\([0-9]\+\)=: [\1]=:g' -e 's:\^ : :g')
eval value=($tmp)
echo "35=${value[35]}^150=${value[150]}"
Aaron Digulla
A: 

given a file 'in' containing your strings :

$ for i in $(cut -d^ -f2,3 < in);do echo $i^;done
35=D^150=1^
35=D^150=2^
matja