Hi all,
first of all: I'm not a programmer, never was, although had learn a lot during my professional carreer as a support consultant.
Now my task is to process - and create some statistics about a constantly written and rapidly growing XML like log file. It's not valid XML, because it does not have a proper <root>
element, e.g. the log looks like this:
<log itemdate="somedate">
<field id="0" />
...
</log>
<log itemdate="somedate+1">
<field id="0" />
...
</log>
<log itemdate="somedate+n">
<field id="0" />
...
</log>
E.g. I have to count all the items with field id=0. But most of the solutions I had found (e.g. using XPath) reports an error about the garbage after the first closing </log>
.
Most probably I can use python (2.6, although I can compile 3.x as well), or some really old perl version (5.6.x), and recently compiled xmlstarlet which really looks promising - I was able to create the statistics for a certain period after copying the file, and pre- & appending the opening and closing root element. But this is a huge file and copying takes time as well. Isn't there a better solution?
Thanks in advance!
Update: I should have been more specific, sorry. So the task involves more expected result and not only counting elements. Which are specified, as follows:
- The above mentioned counting of various element types in the file, between a timerange (e.g. from previous run to now)
- Check if a certain element is followed by another element (e.g.
<field id="0" /><field2 value="100" />
must be followed by a<field id="0" /><field2 value="110" />
) - The expected output is an
INSERT
statement.
And yes it can be done with awk as well - or other string processing tool - but the designer prefered somehow XPath
... Maybe he had read something, somewhere. Some why?