views:

1067

answers:

4

Hi, everyone:

I am working on transforming a xml file from old version to new version. Here is the basic template which i am using:

<xsl:template match="*">
    <xsl:element name="{name(.)}" namespace="{namespace-uri(.)}">
      <xsl:copy-of select="@*"></xsl:copy-of>
      <xsl:apply-templates></xsl:apply-templates>
    </xsl:element>
</xsl:template>

However, new version of xml schema requires that all elements which has a text value should not be empty string. So old xml document such as:

<dataset>
 <title> </title>
</dataset>

will be invalid in the new version. I tried to modify default template for text node. The new text template will check the text node if the text code is empty string, it will terminate the transformation, otherwise it will copy the value to the output xml. Here is the template:

<xsl:template match="text()">
    <xsl:variable name="text-value" select="."/>
      <xsl:if test="normalize-space($text-value) = ''">
          <xsl:message terminate="yes">
                <xsl:call-template name="output_message3_fail">
                  <xsl:with-param name="parent_node" select="name(parent::node())"/>
                </xsl:call-template>
          </xsl:message>
      </xsl:if>
      <xsl:value-of select="$text-value"/>
</xsl:template>

However, i found out if input looks like:

<dataset>
 <title>My tile</title>
</dataset

the new text template will be called. If input looks like:

<dataset>
 <title> </title>
</dataset>

the new text template will never be called and output will looks like

<dataset>
     <title/>
</dataset>

So my approach - modifying the text template, doesn't work. Do you have any suggestion how to do this - if find an element with empty string, terminate the transformation.

Thank you very much!

By the way, i am using java xalan xslt processor.

+2  A: 

However, i found out if input looks like:

<dataset>
  <title>My tile</title>
</dataset>

the new text template will be called

Yes, this is exactly what the provided code should be doing -- I will explain this in a moment.

If input looks like:

<dataset>
  <title> </title>
</dataset>

the new text template will never be called and output will looks like

<dataset>
  <title/>
</dataset>

I couldn't reproduce this with Xalan (J or c) and many other XSLT processors I have (Saxon 6.5.3, .NET XslCompiledTransform and XslTransform, Msxml3,4, 6, JD,... etc). All of them display an error message (inside <xsl:message terminate="yes">)

The only XSLT processor that produces the above output is AltovaXML (XmlSPY).

If you are using XmlSPY, probably you could consider either trying to use another XSLT processor or contacting Altova for assistance.

Now, back to the first behavior.

Explanation:

The provided source XML file:

<dataset>
  <title>My tile</title>
</dataset>

has three text nodes:

  1. The first text node is the one between <dataset> and <title> and it contains only whitespace.

  2. The second text node is the only child of <title> and its value is the string "My tile".

  3. The third and last text node is between </title> and </dataset> and consists of only whitespace.

When the template matching text() is selected for processing the first of the above three text nodes, the test is positive and <xsl:message terminate="yes"> is executed -- and this is exactly the reported behavior.

Solution:

A simple solution exists. Just change the template matching text() to match only such text nodes that are the only text node of their parent. Now the XSLT transformation behaves as expected for the both types of XML documents that were originally provided:

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"&gt;
  <xsl:output omit-xml-declaration="yes" indent="yes"/>

  <xsl:template match="*">
    <xsl:element name="{name(.)}" namespace="{namespace-uri(.)}">
      <xsl:copy-of select="@*"/>
      <xsl:apply-templates/>
    </xsl:element>
  </xsl:template>

  <xsl:template match=
    "*[not(node()[2])]/text()
              [normalize-space()='']">
    <xsl:message terminate="yes">
      <xsl:call-template name="output_message3_fail">
        <xsl:with-param name="parent_node" select="name(..)"/>
      </xsl:call-template>
    </xsl:message>
  </xsl:template>

  <xsl:template name="output_message3_fail">
    <xsl:param name="parent_node"/>

    <xsl:message>        ERROR:        
      &lt;<xsl:copy-of select="$parent_node"/>> is empty
    </xsl:message>
  </xsl:template>
</xsl:stylesheet>

When this transformation is applied on the provided XML document:

<dataset>
  <title>My tile</title>
</dataset>

The wanted result is produced:

<dataset>
   <title>My tile</title>
</dataset>

When it is applie on the second XML document:

<dataset>
    <title> </title>
</dataset>

the correct result is produced:

ERROR:        
  <title> is empty
Dimitre Novatchev
Would normalize-space do the same as strip-space (and be supported in XPATH 1.0 as well?)
geoffc
@geoffc No, normalize-space() is a function which can be used to act on the value of a text node, not to cause the node to be not-read
Dimitre Novatchev
I'm confused. xsl:strip-space strips whitespace-only text nodes from the source tree. So under what circumstances can the template matching text() nodes emit a message? All of the text nodes it's testing for have already been removed from the source tree.
Robert Rossney
I have explained the first of the two problems reported by the OP and provided a solution to it. Please, hold on for a few minutes while the second problem is treated in the context of <xsl:strip-space>
Dimitre Novatchev
@Robert Rossney Thanks for pointing this out. I have edited the solution to accomodate both cases. THe explanation remains the same.
Dimitre Novatchev
If you test for a second child text node, the test returns improper (probably) results where an element has whitespace before but not after its child element. If the XML's formed through some funky string manipulation that's inconsistent in its use of whitespace, that could be a problem.
Robert Rossney
@Robert Rossney I don't understand your statement. Can you provide an example? Use my userid for the Google-provided emai.
Dimitre Novatchev
Sure: "<foo><bar/></foo>". Your template matches the "foo" element, which doesn't have two text children, and it finds an all-whitespace text child. But that's not an element that contains only whitespace, and the transform incorrectly reports that it's empty.
Robert Rossney
@Robert Rossney Correct, the math pattern must be: "*[not(node()[2])]/text()[normalize-space()='']" instead of: "*[not(text()[2])]/text()[normalize-space()='']" I have tested this with all known XML docs and it's OK. Edited the answer. Thanks.
Dimitre Novatchev
A: 

Dear Dimitre Novatchev:

Thank you so much for the kind answer. If the input is:

<dataset>
   <title>My tile</title>
</dataset>

The transformation works.

After adding "<xsl:strip-space elements="*"/>", did you try the input:

<dataset>
  <title> </title>
</dataset>

I got a successful transformation - which I don't like:

<dataset>
  <title/>
</dataset>

I wish the transformation would be terminated if only empty string in title element. Do you have more suggestion?

Thank you again for the help!

@Jing I have explained the first of the two problems reported by you and provided a solution to it. Please, hold on for a few minutes while the second problem is treated in the context of <xsl:strip-space>
Dimitre Novatchev
@jing Please, see the solution now
Dimitre Novatchev
Thanks, Dimitre Novatchev!
A: 

I'm not clear on what it is you really want. You say you don't want to emit elements that contain the empty string, and then give as an example this:

<dataset>
   <title> </title>
</dataset>

in which the title element doesn't contain the empty string. It contains whitespace. So I'm going to assume that by "empty string" you mean "whitespace only."

Using xsl:strip-space will eliminate whitespace-only text nodes from the source tree before processing it. If you genuinely want to abort the transform with an exception if you encounter an element containing whitespace, you can't use xsl:strip-space, as it will remove all of the exception-triggering conditions before the transform runs.

I think what you want to do instead is write a template that looks like this:

<xsl:template match="*[not(*) and text() and not(normalize-space(text()) != '')]">
   ...

This template will match any element for which the following is true:

  • it doesn't have child elements
  • it does contain at least one text node
  • all of the text nodes it contains are whitespace-only

So in your example, it won't match the dataset element (because it has a child element), but it will match the title element. It wouldn't match <title/> or <title></title>, though, because neither of those elements contains text nodes.

Robert Rossney
@Robert Rossney Thanks for pointing this out. I have edited the solution to accomodate both cases. THe explanation remains the same.
Dimitre Novatchev
Robert Rossney: "empty string" does mean "whitespace only". Your template does what I want. Thank you very much! Also, thank you, Dimitre Novatchev!
@Robert Rossney I don't understand your statement. Can you provide an example? Use my userid for the Google-provided emai
Dimitre Novatchev
@Robert Rossney Correct, the math pattern must be: "*[not(node()[2])]/text()[normalize-space()='']" instead of: "*[not(text()[2])]/text()[normalize-space()='']" I have tested this with all known XML docs and it's OK. Edited the answer. Thanks
Dimitre Novatchev
A: 

Maybe the test should be something like

length(text())!=0 && length(strip-whitespace(text())) == 0

Doesn't xslt support regular expressions? If so, then that would be the way to go.

But does he want that every element must contain some nonspace text? Or are there some elements that must contain at least something and other elements where

<foo bar="BAR"/>

is ok? I'll bet anything it is. I think that it is likely that he is going to have to write ules on a case-by-case basis for those elements that must be non-empty.

Which leads me to my final comment: the correct technology for checking the validity of an XML document is an XML schema.

paulmurray