I need to process XML documents of varying formats into records in a MySQL database on a daily basis. The data I need from each XML document is interspersed with a good deal of data I don't need, and each document's node names are different. For example:
source #1:
<object id="1">
<title>URL 1</title>
<url>http://www.one.com</url>
<frequency interval="60" />
<uselessdata>blah</uselessdata>
</object>
<object id="2">
<title>URL 2</title>
<url>http://www.two.com</url>
<frequency interval="60" />
<uselessdata>blah</uselessdata>
</object>
source #2:
<object">
<objectid>1</objectid>
<thetitle>URL 1</thetitle>
<link>http://www.one.com</link>
<frequency interval="60" />
<moreuselessdata>blah</moreuselessdata>
</object>
<object">
<objectid>2</objectid>
<thetitle>URL 2</thetitle>
<link>http://www.two.com</link>
<frequency interval="60" />
<moreuselessdata>blah</moreuselessdata>
</object>
...where I need the object's ID, interval, and URL.
My ideas for approaches are:
1.) Having a separate function to parse each XML document and iteratively create the SQL query from within that function
2.) Having a separate function parse each document and iteratively add each object to my own object class, and have the SQL work done by a class method
3.) Using XSLT to convert all the documents into a common XML format and then writing a parser for that document.
The XML documents themselves aren't all that large, as most will be under 1MB. I don't anticipate their structure changing often (if ever), but there is a strong possibility I will need to add and remove further sources as time goes on. I'm open to all ideas.
Also, sorry if the XML samples above are mangled... they're not terribly important, just a rough idea to show that the node names in each document are different.