I need to analyze a few thousand XML documents to see if some of them contains a certain construct. The problem is that some of the documents doesn't contain well formed XML.
The basic idea was to use fn:collection()
and search inside nodes returned. But this only works if all documents in the collection are well formed.
Is it possible to do something similar but only parsing the well formed documents?
This is my XSLT, simplified, which works if all documents in $dir
are well formed:
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet
xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="2.0"
xmlns:xs="http://www.w3.org/2001/XMLSchema">
<xsl:output method="text"/>
<xsl:variable name="dir" as="xs:string">file:/c:/path/to/files/</xsl:variable>
<xsl:variable name="files" select="concat($dir, '?select=*.xml')" as="xs:string"/>
<xsl:template match="/">
<xsl:variable name="docs" select="collection($files)"/>
<xsl:variable name="names" select="
for $i in $docs return
distinct-values($i//*[exists(@an-attribute-to-find)]/local-name())"/>
<xsl:value-of select="distinct-values($names)" separator="
"/>
</xsl:template>
</xsl:stylesheet>
Would it be possible to do something like this without manually sorting out the non well formed documents before transformation starts? Maybe you have a better suggestion to a solution?