views:

92

answers:

4

Hi

I wrote a piped shell command that has multiple pipes in it that works great. I now want to put this in the form of a (tidy) shell script. Here is the script:

#!/bin/bash
for number in `cat xmlEventLog_2010-03-23T* | sed -nr "/<event eventTimestamp/,/<\/event>/ {/event /{s/^.*$/\n/; p};/payloadType / {h; /protocol/ {s/.*protocol=\"([^\"]*)?\".*/protocol: \1/}; p; x; /type/ {s/.*type=\"([^\"]+)\".*/payload: \1/g}; /type/! {s/.*protocol=\"([^\"]+)\".*/payload: \1/g}; p};/sender / {/sccpAddress/ {s/.*sccpAddress=\"([^\"]*)?\".*/sccpAddress: \1/}; /sccpAddress/! {s/.*/sccpAddress: Unknown/}; p};/result /{s/.*value=\"([^\"]+)\".*/result: \1/g; p};/filter code/{s/.*type=\"([^\"]+)\".*/type: \1/g; p};}"| tee checkThis.txt| awk 'BEGIN{FS="\n"; RS=""; OFS=";"; ORS="\n"} $1~/result: Blocked|Modified/ && $2~/sccpAddress: 353201000001/ && $4~/payload: SMS-MO-FSM-INFO|SMS-MO-FSM/ {$1=$1 ""; print}' | sort | uniq -c| egrep "NUMBER_BLACKLIST|USER_BLACKLIST|NUMBER_WALLEDGARDEN|USER_WALLED_GARDEN|SERVICE_RESTRICTION|BLOCK_VOICE_TO_SMS|PEP_Blacklist_Whitelist" | awk '{print $1}'`; do fil="$fil+$number"
done
echo "fil is $fil"

I would like to tidy this up so that it is readable. The for loop which pipes into sed and awk is pig ugly to view. Has anybody got suggestions to tidy up this piped monstrosity. Would the pipes stop me from breaking this up onto different lines?

Thanks

A

If you copy the lines above to notepad you will see what I mean about ugly (but functional)

Ok folks. Here is the final cleaned up version.

It was mentioned that the event_structure function could be done entirely in awk. I wonder if anybody could show me an example of how this could be done. The record separator would be set to /event and that would separate the events but it's the structures that are in events.txt (see below) that I'm interested in. The number outcome is immaterial.

The core of the code is in the event_structure function. I want to parse out the data and put it all into data structures for later inspection should the case arrise. The following works fine. On the line that starts with payloadType I need to parse out 2 values or set any missing values to Unknown. Is this totally awkable or is the sed/awk combination I have here the best way to do this?

#!/bin/bash

event_structure() {
      sed -nr "/<event eventTimestamp/,/<\/event>/ {
            /event /{s/^.*$/\n/; p}
            /payloadType / {h; /protocol/ {s/.*protocol=\"([^\"]*)?\".*/protocol: \1/}; p; x; /type/ {s/.*type=\"([^\"]+)\".*/payload: \1/g}; /type/! {s/.*protocol=\"([^\"]+)\".*/payload: \1/g}; p}
            /sender / {/sccpAddress/ {s/.*sccpAddress=\"([^\"]*)?\".*/sccpAddress: \1/}; /sccpAddress/! {s/.*/sccpAddress: Unknown/}; p}
            /result /{s/.*value=\"([^\"]+)\".*/result: \1/g; p}
            /filter code/{s/.*type=\"([^\"]+)\".*/type: \1/g; p};}" xmlEventLog_2010-03-23T* |
      tee events.txt|
      awk 'BEGIN{FS="\n"; RS=""; OFS=";"; ORS="\n"}
      $1~/result: Blocked|Modified/ && $2~/sccpAddress: 353201000001/ && $4~/payload: SMS-MO-FSM-INFO|SMS-MO-FSM/ {$1=$1 ""; print}'
}

numbers=$(event_structure | sort | uniq -c | egrep "NUMBER_BLACKLIST|USER_BLACKLIST|NUMBER_WALLEDGARDEN|USER_WALLED_GARDEN|SERVICE_RESTRICTION|BLOCK_VOICE_TO_SMS|PEP_Blacklist_Whitelist" | awk '{print $1}')
addition=`echo $numbers | tr -s ' \n\t' '+' | sed -e '1s/^/fil is /' -e '$s/+$//'`
for number in $numbers
do
      fil="$fil+$number"
done
echo $addition=$(($fil))

Here is a section of the events.txt file produced:

result: Blocked
sccpAddress: 353869000000
protocol: SMS
payload: COPS
type: SERVICE_BLACK_LIST
result: Blocked


result: Blocked
sccpAddress: 353869000000
protocol: SMS
payload: COPS
type: SERVICE_BLACK_LIST
result: Blocked

result: Modified
sccpAddress: Unknown
protocol: IM
payload: IM
type: NUMBER_BLACKLIST
result: Modified

result: Allowed
sccpAddress: Unknown
protocol: MM1
payload: MM1

Here is the output:

$ ./bashShell.sh
fil is 2+372+1+1+214+73+1+20=684

Here is an output of just the function call:

$ ./bashShell.sh | head -10
result: Blocked;sccpAddress: 353201000001;protocol: SMS;payload: SMS-MO-FSM;type: TEXT_ANALYSIS;result: Blocked
result: Blocked;sccpAddress: 353201000002;protocol: SMS;payload: SMS-MT-FSM;type: TEXT_ANALYSIS;result: Blocked
result: Blocked;sccpAddress: 353201000005;protocol: SMS;payload: SMS-MO-FSM;type: SERVICE_BLACKLIST;result: Blocked
result: Blocked;sccpAddress: 353201000021;protocol: SMS;payload: SMS-MT-FSM;type: NUMBER_BLACKLIST;result: Blocked
result: Blocked;sccpAddress: 353201000033;protocol: IM;payload: IM;type: NUMBER_BLACKLIST;result: Blocked
result: Blocked;sccpAddress: 353401009001;protocol: SMS;payload: SMS-MO-FSM;type: NUMBER_BLACKLIST;result: Blocked
result: Blocked;sccpAddress: 353201000001;protocol: SMS;payload: SMS-MO-FSM;type: NUMBER_BLACKLIST;result: Blocked
result: Blocked;sccpAddress: 353201000005;protocol: SMS;payload: SMS-MO-FSM;type: NUMBER_BLACKLIST;result: Blocked
result: Blocked;sccpAddress: 353401000001;protocol: SMS;payload: SMS-MO-FSM;type: NUMBER_BLACKLIST;result: Blocked
result: Blocked;sccpAddress: 353201000001;protocol: SMS;payload: SMS-MO-FSM;type: NUMBER_BLACKLIST;result: Blocked

p.s I named the script bashShell.sh for no particular reason

A

+3  A: 

Pipes don't stop you when breaking to multiple lines, but use $( ... ) instead of backticks. Something like this should work:

#!/bin/bash

for number in $(
    cat xmlEventLog_2010-03-23T* |
    sed -nr "/<event eventTimestamp/,/<\/event>/ {/event /{s/^.*$/\n/; p};/payloadType / {h; /protocol/ {s/.*protocol=\"([^\"]*)?\".*/protocol: \1/}; p; x; /type/ {s/.*type=\"([^\"]+)\".*/payload: \1/g}; /type/! {s/.*protocol=\"([^\"]+)\".*/payload: \1/g}; p};/sender / {/sccpAddress/ {s/.*sccpAddress=\"([^\"]*)?\".*/sccpAddress: \1/}; /sccpAddress/! {s/.*/sccpAddress: Unknown/}; p};/result /{s/.*value=\"([^\"]+)\".*/result: \1/g; p};/filter code/{s/.*type=\"([^\"]+)\".*/type: \1/g; p};}"|
    tee checkThis.txt |
    awk 'BEGIN{FS="\n"; RS=""; OFS=";"; ORS="\n"} $1~/result: Blocked|Modified/ && $2~/sccpAddress: 353201000001/ && $4~/payload: SMS-MO-FSM-INFO|SMS-MO-FSM/ {$1=$1 ""; print}' |
    sort |
    uniq -c |
    egrep "NUMBER_BLACKLIST|USER_BLACKLIST|NUMBER_WALLEDGARDEN|USER_WALLED_GARDEN|SERVICE_RESTRICTION|BLOCK_VOICE_TO_SMS|PEP_Blacklist_Whitelist" |
    awk '{print $1}'
  ); do fil="$fil+$number"
done
echo "fil is $fil"

Of course the larger part is to split the awk and sed skripts into multiple lines also...

But I believe that even after that the result will be still quite unreadable.

I would suggest just completely rewriting the script in Perl, Ruby or any other a bit more readable scripting language than Bash. This is just a suggestion from my personal experience - every time a start out with a shell script I finally rewrite it in Ruby. I love Bash, but it just doesn't seem to scale.

Rene Saarsoo
I wrote it in python too (very pretty) but I like to have the bash version cleaned up. Thanks
amadain
What's wrong with backticks?
Roman Cheplyaka
I'm with Rene. You need to use something that is geared for working with XML files. You shouldn't try to use regexes for that. Short of that, the `sed` part could be rewritten in AWK, the `egrep` and the `uniq` could be done in AWK. If you have `gawk`, you could do the `sort`, too. But in the end, you should use a Python or Perl XML module.
Dennis Williamson
@Roman: Backticks: [BashFAQ/082](http://mywiki.wooledge.org/BashFAQ/082)
Dennis Williamson
@Dennis: I agree about "preferred", but the original answer gives an impression that backticks won't work with wrapped lines, which is false.
Roman Cheplyaka
How would you write that script entirely in awk? I wrote this with python and xml.sax but for various reasons I needed to have a shell version of it.
amadain
+2  A: 

Two small remarks:

Put the 'for list' in a separate function:

number_list() {
    # complete pipe command list
    # divided over multiple lines
}

for number in `number_list`
do
   # ...
done

Try to combine some of the commands: The cat is not needed, the final egrep and awk can be combined.

schot
why is the cat not needed?
amadain
@amadain: `sed` also takes multiple file arguments, so you can replace `cat file ... | sed [pat]` with `sed [pat] file ...`.
schot
+1  A: 

You can join the different tokens using tr and prepend 'fil is' using sed:

pipeline | tr -s ' \n\t' '+' | sed -e '1s/^/fil is /' -e '$s/+$//'

The pipeline can be split over multiple lines using \:

first-command \
    | second-command \
    | third-command \
    ...
    | last-command
Bart Sas
+1  A: 

The shell script is actually the simple part. The sed script is the scary bit. The script can be improved with here documents, but witness the comment:

#!/bin/bash

seds=/tmp/seds.$$
awks=/tmp/awks.$$
gres=/tmp/gres.$$

trap "rm -f $seds $awks $gres" 0 1 2 3 15

# this is a noble and hairy attempt to parse xml with sed
# it is extremely fragile and strongly dependent upon
# the form of the source file never changing
# I'm alternately proud or disgusted that I've been able
# to get away with this

cat > $seds <<'EOF'
/<event eventTimestamp/,/<\/event>/ {/event /{s/^.*$/\n/; p};
/payloadType / {h; /protocol/ {s/.*protocol=\"([^\"]*)?\".*/protocol: \1/}; p; x;
/type/ {s/.*type=\"([^\"]+)\".*/payload: \1/g};
/type/! {s/.*protocol=\"([^\"]+)\".*/payload: \1/g}; p};
/sender / {/sccpAddress/ {s/.*sccpAddress=\"([^\"]*)?\".*/sccpAddress: \1/};
/sccpAddress/! {s/.*/sccpAddress: Unknown/}; p};
/result /{s/.*value=\"([^\"]+)\".*/result: \1/g; p};
/filter code/{s/.*type=\"([^\"]+)\".*/type: \1/g; p};}
EOF

cat > $awks <<'EOF'
BEGIN {FS="\n"; RS=""; OFS=";"; ORS="\n"}
$1~/result: Blocked|Modified/ && \
$2~/sccpAddress: 353201000001/ && \
$4~/payload: SMS-MO-FSM-INFO|SMS-MO-FSM/ {$1=$1 ""; print}
EOF

cat > $gres <<EOF
NUMBER_BLACKLIST
USER_BLACKLIST
NUMBER_WALLEDGARDEN
USER_WALLED_GARDEN
SERVICE_RESTRICTION
BLOCK_VOICE_TO_SMS
PEP_Blacklist_Whitelist
EOF

cat xmlEventLog_2010-03-23T* | \
sed -nr -f $seds | \
tee checkThis.txt | \
awk -f $awks | \
sort | uniq -c | \
fgrep -f $gres | \
awk '{print $1}'
msw