views:

53

answers:

4

Hi Folks I have a file that has multiple lines like the following:

"<sender from="+919892000000" msisdn="+919892000000" ipAddress="" destinationServerIp="" pcfIp="" imsi="892000000" sccpAddress="+919895000005" country="IN" network="India::Airtel (Kerala)"
"<sender from="+919892000000" msisdn="+919892000000" ipAddress="" destinationServerIp="" pcfIp="" sccpAddress="+919895000005" country="IN" network="India::Airtel (Kerala)"

In the first one imsi exists and in the second line imsi does not exist For every line that starts with the word sender (there are other lines in the file) I want to extract both the msisdn value and the imsi value. If the imsi value is not there I would line to print out imsi: Unknown.

I tried the following but it does not work:

/sender / { /msisdn/ {s/.*msisdn=\"([^\"]*)?\".*/msisdn: \1/}; p; /imsi/ {s/.*imsi=\"([^\"]*)?\".*/imsi: \1/}; /imsi/! {s/.*/imsi: Unknown/}; p};

What am I missing?

A

A: 

I used Perl to solve your problem.

cat file | perl -n -e 'if (/sender.*msisdn="([^"]*)"(.*imsi="([^"]*)")?/) { print $1, " ", $3 || "unknown", "\n"; }'
Sjoerd
I would rather sed as its part of a wider multi lined sed script that parses multiple values from different lines to create a structure for awk parsing
amadain
+1  A: 

This can be done using the following sed script:

s/^.*sender .*msisdn="\([^"]*\)" .* imsi="\([^"]*\)".*$/msisdn: \1, imsi: \2/
t
s/^.*sender .*msisdn="\([^"]*\)".*$/msisdn: \1, imsi: Unknown/
t
d
  • The first s command will print all sender lines containing the imsi field.
  • The first t command will continue with the next line if the previous command succeeded.
  • The second t command will print all sender lines without the imsi field.
  • The second t command will continue with the next line if the previous command succeeded.
  • The d command will remove all other lines.

In order to run this script, just copy it to a file and run it using sed -f script.

Bart Sas
I'm not sure how I could integrate this into the sed command that is listed in my answer to my question (which is more of a qualifier to the question than an answer)
amadain
+1  A: 

Just to add to why I am using sed for this particular problem. The following is the multi lined sed that I am using to create a data structure to pass into awk:

cat xmlEventLog_2010-03-23T* | 
sed -nr "/<event eventTimestamp/,/<\/event>/  {
/event /{/uniqueId/ {s/.*uniqueId=\"([^\"]+)\".*/\nuniqueId: \1/g}; /uniqueId/!  {s/.*/\nuniqueId: Unknown/}; p};
/payloadType / {/type/ {s/.*type=\"([^\"]+)\".*/payload: \1/g}; /type/! {s/.*protocol=\"([^\"]+)\".*/payload: \1/g}; p}; 

***/sender / { /msisdn/ {s/.*msisdn=\"([^\"]*)?\".*/msisdn: \1/}; p; /imsi/ {s/.*imsi=\"([^\"]*)?\".*/imsi: \1/}; p; /imsi/! {s/.*/imsi: Unknown/}; p};

/result /{s/.*value=\"([^\"]+)\".*/result: \1/g; p}; /filter code/{s/.*type=\"([^\"]+)\".*/type: \1/g; p}}" 

| awk 'BEGIN{FS="\n"; RS=""; OFS=";"; ORS="\n"} $4~/payload: SMS-MT-FSM-INFO|SMS-MT-FSM|SMS-MT-FSM-DEL-REP|SMS-MT-FSM-DEL-REP-INFO|SMS-MT-FSM-DEL-REP/ && $2~/result: Blocked|Modified/ && $3~/msisdn: +919844000011/ {$1=$1 ""; print}'

This parses out files that are filled with events like so:

       <event eventTimestamp="2010-03-23T00:00:00.074" originalReceivedMessageSize="28" uniqueId="1280361600.74815_PFS_1_2130328364" deliveryReport="true">
            <result value="Allowed"/>
            <source name="MFE" host="PFS_1"/>
            <sender from="+919892000000" msisdn="+919892000000" ipAddress="" destinationServerIp="" pcfIp="" imsi="892000000" sccpAddress="+919895000005" country="IN" network="India::Airtel (Kerala)">
                    <profile code=""/>
                    <mvno code=""/>
            </sender>
            <recipients>
                    <recipient code="+919844000039" imsi="892000000" SccpAddress="+919895000005" country="IN" network="India::Airtel (Kerala)">
                    </recipient>
            </recipients>
            <payload>
                    <payloadType protocol="SMS" type="SMS-MT-FSM-DEL-REP"/>
                    <message signature="70004b7c9267f348321cde977c96a7a3">
                            <MailFrom value=""/>
                            <rcptToList>
                            </rcptToList>
                            <pduList>
                                    <pdu type="SMS_SS_REQUEST_IND" time="2010-07-29T00:00:00.074" source="SMSPROBE" dest="PCF"/>
                                    <pdu type="SMS_SS_REQ_HANDLING_STOP" time="2010-07-29T00:00:00.074" source="PCF" dest=""/>
                            </pduList>
                            <numberOfImages>0</numberOfImages>
                            <attachments numberOf="1">
                                    <attachment index="0" size="28" contentType="text/plain"/>
                            </attachments>
                            <emailSmtpDeliveryStatus value="" time="" reason=""/>
                            <pepId value="989350000109.989350000209.0.0"/>
                    </message>
            </payload>
            <filters>
            </filters>
    </event>

There could be up to 10000 events like the one above each file and there will be hundreds of files. The structures output for awk should be of the type:

uniqueId: 1280361600.208152_PFS_1_1509661383
result: Allowed
msisdn: +919892000000
imsi: 892000000
payload: SMS-MT-FSM-DEL-REP
filter:

So for this reason I need to extract 2 values from the sender line and different values from the other lines. The abovementioned filter extracts all correctly except for the part when the sender line is found (marked * in the filter). So I just want to extract the 2 items from the sender line for the structure. Multiple attempts have failed.

amadain
You should have posted this as an edit to your question.
Dennis Williamson
sorry about that. I wasn't sure how to continue the thread. Thx
amadain
+1  A: 

Your match for "msisdn" is stripping out the "imsi" so the negative match is always taken. Simply copy your line into hold space, do your "msisdn" processing, swap the hold space back into pattern space, then do your "imsi" processing:

/sender / {h; /msisdn/ {s/.*msisdn=\"([^\"]*)?\".*/msisdn: \1/}; p;x; /imsi/ {s/.*imsi=\"([^\"]*)?\".*/imsi: \1/}; /imsi/! {s/.*/imsi: Unknown/};p}
Dennis Williamson
Thank worked perfectly. Thanks
amadain
actually that brings up another question. Its not needed by me for any processing that I have to do but it is more for my knowledge. How would you use this to parse out 3 variables? I never found a use for the hold space before now so it never came to mind but it seems that your solution is great for parsing 2 variables. Just wondering.
amadain
@amadain: Instead of `x` to swap pattern space and hold space, you could use `g` to copy hold space to pattern space and process it - repeating that as many times as you needed.
Dennis Williamson