views:

67

answers:

3

Hi I have a file of events that has multiple multi lined events between <event> and </event> tags. I want to print out the entire event From <event> to </event> only if a line within that event contains either the string uniqueId="1279939300.862594_PFM_1_1912320699" or uniqueId="1281686522.353435_PFM_1_988171542". The file has 100000 events in it and each event has between 20 and 35 lines (attributes within the event vary its length). I started off using sed but need a little help beyond:

cat xmlEventLog_2010-03-23T* | sed -nr "/<event eventTimestamp/,/<\/event>/"

What do I need to do to finish this? Also is sed the best way of doing this given the size of the files?

Thanks in advance

A

+2  A: 
awk -vRS="</event>" '/<event>/ && /1279939300.862594_PFM_1_1912320699|1281686522.353435_PFM_1_988171542/{print}' file
ghostdog74
This worked brilliant thanks. I am now checking times against the sed one above to see which one is better timewise (as my files are huge). Here are the times for this: real 3m4.890suser 3m2.273ssys 0m2.568s
amadain
A: 

You should be able to embed the unique ids directly into the regular expression, using the | character to allow either uniqueid. I did a quick test and the following regular expression seems to find the correct entries:

 <event.*?uniqueid=("1279939300\.862594_PFM_1_1912320699"|"1281686522\.353435_PFM_1_988171542").*?</event>
Ryan Brunner
the uniqueId is not necessarily on the same line as the <event> tag
amadain
I have to admit that I'm not familiar with sed, but is there not a toggle to enable multi-line regexes?
Ryan Brunner
yea. I'm trying Dennis's multi-line sed from above. The awk one worked in around 3 mins
amadain
+1  A: 

Give this a try:

sed -n ':a; /<event>/,/<\/event>/ N; /<event>/,/<\/event>/!b; /<\/event>/ {/uniqueId="1279939300.862594_PFM_1_1912320699"\|uniqueId="1281686522.353435_PFM_1_988171542"/p;d}; ba'
Dennis Williamson
after 15mins of waiting this still hasn't returned an answer. Its probably more suited to smaller files. Thanks though for the multi line sed. This is at a level that I have not used yet so I will study it and learn from it
amadain
@amadain: I had a couple of errors. See the edited version.
Dennis Williamson
dennis a means append a line and is used after a pattern match (from what I could find). Could you explain what it does here as a sole command :a ?
amadain
@amadain: `:` is the label command, `a` is the name of the label, `b` is the branch command. So when `sed` reaches `ba` it branches to label "a" which is `:a` creating a loop. The `b` without a label branches to the end (past the last command) thus exiting the loop. The `d` (delete) command also branches to the end exiting the loop.
Dennis Williamson
that's wonderful sedding. Thank you
amadain
for one of the first times the sed answer is considerably more complicated than the awk answer.
amadain