I've simplified the problem somewhat, but I hope I've still captured the essence of my problem.
Let's say I have the following simple XML file:
<main>
outside1
===BEGIN===
inside1
====END====
outside2
=BEGIN=
inside2
==END==
outside3
</main>
Then I can use the following the XSLT 2.0:
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="2.0">
<xsl:template match="text()">
<xsl:analyze-string select="." regex="=+BEGIN=+">
<xsl:matching-substring>
<section/>
</xsl:matching-substring>
<xsl:non-matching-substring>
<xsl:analyze-string select="." regex="=+END=+">
<xsl:matching-substring>
<_section/>
</xsl:matching-substring>
<xsl:non-matching-substring>
<xsl:value-of select="."/>
</xsl:non-matching-substring>
</xsl:analyze-string>
</xsl:non-matching-substring>
</xsl:analyze-string>
</xsl:template>
</xsl:stylesheet>
To transform it to the following:
<?xml version="1.0" encoding="UTF-8"?>
outside1
<section/>
inside1
<_section/>
outside2
<section/>
inside2
<_section/>
outside3
Here are the questions:
Multiple regexes
Is there a better way to match two different regexes rather than nesting them inside another like what was done above?
- What if they're not easily nestable like this?
- Can I have XSL templates to match and transform regex matches in a
text()
?- In this case, I'd have two templates, one for each regex
- If possible, this would be the ideal solution
Opening and closing elements on regex matches
Obviously, instead of:
<section/>
inside
<_section/>
What I really want eventually is:
<section>
inside
</section>
So how would you do this? I'm not sure if it's even possible to open an element in one regex match and close it in another (i.e. What if there is no match for the closer? The result will not be well-formed XML!), but it seems like this task is quite typical that there has to be an idiomatic solution for them.
Note: we can assume that sections will not overlap, and thus also will not nest. We can also assume that they will always appear in proper pairs.
Additional info
So essentially I'm trying to accomplish what in Perl would succintly be something like:
s/=+BEGIN=+/<section>/
s/=+END=+/<\/section>/
I'm looking for a way to do this in XSLT instead, because:
- It'd be more robust with regards to the context of the regex match
- (i.e. it should only transform
text()
nodes)
- (i.e. it should only transform
- It'd also be more robust with regards to matching various XML entities