views:

34

answers:

0

Hi all,

first of all: I'm not a programmer, never was, although had learn a lot during my professional carreer as a support consultant.

Now my task is to process - and create some statistics about a constantly written and rapidly growing XML like log file. It's not valid XML, because it does not have a proper <root> element, e.g. the log looks like this:

<log itemdate="somedate">
  <field id="0" />
  ...
</log>

<log itemdate="somedate+1">
  <field id="0" />
  ...
</log>

<log itemdate="somedate+n">
  <field id="0" />
  ...
</log>

E.g. I have to count all the items with field id=0. But most of the solutions I had found (e.g. using XPath) reports an error about the garbage after the first closing </log>.

Most probably I can use python (2.6, although I can compile 3.x as well), or some really old perl version (5.6.x), and recently compiled xmlstarlet which really looks promising - I was able to create the statistics for a certain period after copying the file, and pre- & appending the opening and closing root element. But this is a huge file and copying takes time as well. Isn't there a better solution?

Thanks in advance!

Update: I should have been more specific, sorry. So the task involves more expected result and not only counting elements. Which are specified, as follows:

  • The above mentioned counting of various element types in the file, between a timerange (e.g. from previous run to now)
  • Check if a certain element is followed by another element (e.g. <field id="0" /><field2 value="100" /> must be followed by a <field id="0" /><field2 value="110" />)
  • The expected output is an INSERT statement.

And yes it can be done with awk as well - or other string processing tool - but the designer prefered somehow XPath ... Maybe he had read something, somewhere. Some why?