Sounds like a job for sed
(I was so tempted to say SuperSed ;-)
sed -n '/^<.\+>/H; /\(Request\|Response\) XML/{s/^.*</</;x;p}; ${x;p}' xmllog
where xmllog
is your log file's name. You'll get a blank line at the beginning, but that can be filtered out with egrep '.+'
or even just tail -n +2
.
By way of explanation, sed
is a little interpreter for programs that consist of a list of matching conditions and corresponding actions. sed
runs through a file line by line (hence the name, "stream editor" -> "sed") and for each line, for each condition in the program that matches the text on the line, it applies the corresponding action. In this case:
/^<.\+>/
is a regular expression condition that matches any line which contains <
followed by any character (.
) repeated one or more times (\+
) followed by >
- basically any line with an XML tag. The associated action is H
which appends the line to a "hold buffer". The other condition is
/\(Request\|Response\) XML/
which, of course, is a regexp that matches either Request
or Response
followed by a space and then XML
. The corresponding action is
{s/^.*</</;x;p}
which first does a substitution (s
) of the beginning of the line (^
) followed by any character (.
) repeated any number of times (*
) followed by <
, with just <
. Basically that gets rid of anything before the first XML tag on the line. Then it switches (x
) the line just read with the "hold buffer" (which contains the XML of the previous log message) and prints (p
) the stuff that was just swapped in from the hold buffer. Finally,
$
matches the end of the input, and {x;p}
again just swaps the contents of the hold buffer into the "print buffer" and then prints it.
You can alter the command to suit your needs, for example if you need something to delimit the different records, this'll put a blank line between them:
sed -n '/^<.\+>/H; /\(Request\|Response\) XML/{s/^.*</\n</;x;p}; ${x;p}' xmllog
(in that case, of course, don't use egrep
to filter out the blank line at the beginning).