Just to add to why I am using sed for this particular problem. The following is the multi lined sed that I am using to create a data structure to pass into awk:
cat xmlEventLog_2010-03-23T* |
sed -nr "/<event eventTimestamp/,/<\/event>/ {
/event /{/uniqueId/ {s/.*uniqueId=\"([^\"]+)\".*/\nuniqueId: \1/g}; /uniqueId/! {s/.*/\nuniqueId: Unknown/}; p};
/payloadType / {/type/ {s/.*type=\"([^\"]+)\".*/payload: \1/g}; /type/! {s/.*protocol=\"([^\"]+)\".*/payload: \1/g}; p};
***/sender / { /msisdn/ {s/.*msisdn=\"([^\"]*)?\".*/msisdn: \1/}; p; /imsi/ {s/.*imsi=\"([^\"]*)?\".*/imsi: \1/}; p; /imsi/! {s/.*/imsi: Unknown/}; p};
/result /{s/.*value=\"([^\"]+)\".*/result: \1/g; p}; /filter code/{s/.*type=\"([^\"]+)\".*/type: \1/g; p}}"
| awk 'BEGIN{FS="\n"; RS=""; OFS=";"; ORS="\n"} $4~/payload: SMS-MT-FSM-INFO|SMS-MT-FSM|SMS-MT-FSM-DEL-REP|SMS-MT-FSM-DEL-REP-INFO|SMS-MT-FSM-DEL-REP/ && $2~/result: Blocked|Modified/ && $3~/msisdn: +919844000011/ {$1=$1 ""; print}'
This parses out files that are filled with events like so:
<event eventTimestamp="2010-03-23T00:00:00.074" originalReceivedMessageSize="28" uniqueId="1280361600.74815_PFS_1_2130328364" deliveryReport="true">
<result value="Allowed"/>
<source name="MFE" host="PFS_1"/>
<sender from="+919892000000" msisdn="+919892000000" ipAddress="" destinationServerIp="" pcfIp="" imsi="892000000" sccpAddress="+919895000005" country="IN" network="India::Airtel (Kerala)">
<profile code=""/>
<mvno code=""/>
</sender>
<recipients>
<recipient code="+919844000039" imsi="892000000" SccpAddress="+919895000005" country="IN" network="India::Airtel (Kerala)">
</recipient>
</recipients>
<payload>
<payloadType protocol="SMS" type="SMS-MT-FSM-DEL-REP"/>
<message signature="70004b7c9267f348321cde977c96a7a3">
<MailFrom value=""/>
<rcptToList>
</rcptToList>
<pduList>
<pdu type="SMS_SS_REQUEST_IND" time="2010-07-29T00:00:00.074" source="SMSPROBE" dest="PCF"/>
<pdu type="SMS_SS_REQ_HANDLING_STOP" time="2010-07-29T00:00:00.074" source="PCF" dest=""/>
</pduList>
<numberOfImages>0</numberOfImages>
<attachments numberOf="1">
<attachment index="0" size="28" contentType="text/plain"/>
</attachments>
<emailSmtpDeliveryStatus value="" time="" reason=""/>
<pepId value="989350000109.989350000209.0.0"/>
</message>
</payload>
<filters>
</filters>
</event>
There could be up to 10000 events like the one above each file and there will be hundreds of files. The structures output for awk should be of the type:
uniqueId: 1280361600.208152_PFS_1_1509661383
result: Allowed
msisdn: +919892000000
imsi: 892000000
payload: SMS-MT-FSM-DEL-REP
filter:
So for this reason I need to extract 2 values from the sender line and different values from the other lines. The abovementioned filter extracts all correctly except for the part when the sender line is found (marked * in the filter). So I just want to extract the 2 items from the sender line for the structure. Multiple attempts have failed.