views:

146

answers:

1

I would like to parse an Atom Feed and create an Atom-compliant cache of each Entry.

The problem is that some feeds (this one for example) have many namespaces other than the Atom one.

Is it possible to keep intact all Atom nodes and remove each node that belongs to another namespace?

Something like this:

valid_nodes = entry.find('atom:*', '/atom:feed/atom:entry')
# now I need to create an xml string with valid_nodes, but how I do that?
+2  A: 

In XSLT you could use this transformation:

<xsl:stylesheet
  version="1.0"
  xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
  xmlns="http://www.w3.org/2005/Atom"
>
  <xsl:output method="xml" indent="yes" encoding="utf-8" />

  <xsl:template match="node() | @*">
    <xsl:if test="
      namespace-uri() = ''
      or
      namespace-uri() = 'http://www.w3.org/2005/Atom'
    ">
      <xsl:copy>
        <xsl:apply-templates select="node() | @*" />
      </xsl:copy>
    </xsl:if>
  </xsl:template>

  <xsl:template match="text()|comment()">
    <xsl:copy-of select="." />
  </xsl:template>
</xsl:stylesheet>

This copies all nodes verbatim, if they are

  • in the default (empty) namespace
  • in the Atom namespace
  • text nodes or comments

Maybe you can use that.

Tomalak