ansaurus

Question

Answer 1

+2 A:

Using XSLT is an overkill. I like approach (2), it makes a lot of sense.

Using Python I'd try to make a class for every document type. The class would inherit from dict and on its __init__ parse the given document and populate itself with the 'id', 'interval' and 'url'.

Then the code in main would be really trivial, just instantiate instances of those classes (which are also dicts) with the appropriate documents and then pass them off as normal dicts.

Amr Mostafa 2009-03-30 15:54:49

+1: In your case, the different XML docs are really just different tag names. These, in turn, just alter the XPath string you'd use in the ElementTree find and findall functions. The different XML parsing options are really easy to do.

S.Lott 2009-03-30 16:39:09

Answer 2

A:

I've been successfully using variant the third approach. But documents I've been processing were a lot bigger. If it's a overkill or not, well that really depends how fluent you are with XSLT.

vartec 2009-03-30 15:55:56

Answer 3

A:

If your various input formats are unambiguous, you can do this:

<xsl:template match="object">
  <object>
    <id><xsl:value-of select="@id | objectid" /></id>
    <title><xsl:value-of select="title | thetitle" /></title>
    <url><xsl:value-of select="url | link" /></url>
    <interval><xsl:value-of select="frequency/@interval" /></interval>
  </object>
</xsl:template>

For your sample input, this produces:

<object>
  <id>1</id>
  <title>URL 1</title>
  <url>http://www.one.com&lt;/url&gt;
  <interval>60</interval>
</object>
<object>
  <id>2</id>
  <title>URL 2</title>
  <url>http://www.two.com&lt;/url&gt;
  <interval>60</interval>
</object>
<object>
  <id>1</id>
  <title>URL 1</title>
  <url>http://www.one.com&lt;/url&gt;
  <interval>60</interval>
</object>
<object>
  <id>2</id>
  <title>URL 2</title>
  <url>http://www.two.com&lt;/url&gt;
  <interval>60</interval>
</object>

However, there may be faster methods to achieve a usable result than using XSLT. Just measure how fast each approach is, and how "ugly" if feels for you. I would tend to say that XSLT is the more elegant/maintainable solution to process XML. YMMV.

If your input formats are ambiguous and the above solution produces wrong results, a more explicit aproach is needed, along the lines of:

<xsl:template match="object">
  <object>
    <xsl:choose>
      <xsl:when test="@id and title and url and frequency/@interval">
        <xsl:apply-templates select="." mode="format1" />
      </xsl:when>
      <xsl:when test="objectid and thetitle and link and frequency/@interval">
        <xsl:apply-templates select="." mode="format2" />
      </xsl:when>
    </xsl:choose>
  </object>
</xsl:template>

<xsl:template match="object" mode="format1">
  <id><xsl:value-of select="@id" /></id>
  <title><xsl:value-of select="title" /></title>
  <url><xsl:value-of select="url" /></url>
  <interval><xsl:value-of select="frequency/@interval" /></interval>
</xsl:template>

<xsl:template match="object" mode="format2">
  <id><xsl:value-of select="objectid" /></id>
  <title><xsl:value-of select="thetitle" /></title>
  <url><xsl:value-of select="link" /></url>
  <interval><xsl:value-of select="frequency/@interval" /></interval>
</xsl:template>

Tomalak 2009-03-30 16:46:54

ansaurus

tags:

views:

answers:

Processing XML into MySQL in good form

related questions