tags:

views:

129

answers:

4

I have a xml document. I know the index where i need to insert the new node. The index that i have is the position considering only the text nodes and ignoring the element tags. Is there a java api to seek the index in a xml knowing the position relative to the text nodes alone and insert a new node in that position?

+1  A: 

As such there is no direct API in Java to achieve this. But there is XPATH library and DOM parsing techniques where by using little programing you can achieve this easily

Fazal
I think that the worst case option would be iterate over all the elements in the DOM object to achieve it. Please give your idea as to how to apply the XPATH and DOM parsing to achieve this in a better way instead of iterating over the whole DOM elements.
Rachel
A: 

Here is an example showing adding an element to an existing XML document:

http://www.ibm.com/developerworks/library/x-tipmvdom.html

Dougman
+2  A: 

Here is an XSLT 2.0 stylesheet that takes two parameters, the index at which you want to insert (using XSLT/XPath indexing scheme where the index starts with one, not with zero), and the node(s) to insert:

<xsl:stylesheet
  xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
  xmlns:xsd="http://www.w3.org/2001/XMLSchema"
  exclude-result-prefixes="xsd"
  version="2.0">

  <!-- XPath/XSLT index starts with 1 -->
  <xsl:param name="index" as="xsd:integer" select="11"/>
  <xsl:param name="new" as="node()+"><e/></xsl:param>

  <xsl:variable name="text-to-split" as="text()?"
     select="descendant::text()[sum((preceding::text(), .)/string-length(.)) ge $index][1]"/>

  <xsl:template match="@* | node()">
    <xsl:copy>
      <xsl:apply-templates select="@* | node()"/>
    </xsl:copy>
  </xsl:template>

  <xsl:template match="text()[. is $text-to-split]">
    <xsl:variable name="split-index" as="xsd:integer"
      select="$index - sum(preceding::text()/string-length(.))"/>
    <xsl:value-of select="substring(., 1, $split-index - 1)"/>
    <xsl:copy-of select="$new"/>
    <xsl:value-of select="substring(., $split-index)"/>
  </xsl:template>

</xsl:stylesheet>

You can use Saxon 9 to run XSLT 2.0 stylesheets with Java. [edit] Here is an attempt to solve this with XSLT 1.0:

<xsl:stylesheet
  xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
  version="1.0">

  <!-- XPath/XSLT index starts with 1 -->
  <xsl:param name="index" select="11"/>
  <xsl:param name="new"><e/></xsl:param>

  <xsl:template name="find-text-to-split">
    <xsl:param name="text-nodes"/>
    <xsl:variable name="sum">
      <xsl:call-template name="make-sum">
        <xsl:with-param name="nodes" select="$text-nodes[1]/preceding::text() | $text-nodes[1]"/>
      </xsl:call-template>
    </xsl:variable>
    <xsl:choose>
      <xsl:when test="$sum &gt;= $index">
        <xsl:value-of select="generate-id($text-nodes[1])"/>
      </xsl:when>
      <xsl:otherwise>
        <xsl:call-template name="find-text-to-split">
          <xsl:with-param name="text-nodes" select="$text-nodes[position() &gt; 1]"/>
        </xsl:call-template>
      </xsl:otherwise>
    </xsl:choose>
  </xsl:template>

  <xsl:template name="make-sum">
    <xsl:param name="nodes"/>
    <xsl:param name="length" select="0"/>
    <xsl:choose>
      <xsl:when test="not($nodes)">
        <xsl:value-of select="$length"/>
      </xsl:when>
      <xsl:otherwise>
        <xsl:call-template name="make-sum">
          <xsl:with-param name="nodes" select="$nodes[position() &gt; 1]"/>
          <xsl:with-param name="length" select="$length + string-length($nodes[1])"/>
        </xsl:call-template>
      </xsl:otherwise>
    </xsl:choose>
  </xsl:template>

  <xsl:variable name="text-to-split-id">
    <xsl:call-template name="find-text-to-split">
      <xsl:with-param name="text-nodes" select="descendant::text()"/>
    </xsl:call-template>
  </xsl:variable>

  <xsl:template match="*">
    <xsl:copy>
      <xsl:apply-templates select="@* | node()"/>
    </xsl:copy>
  </xsl:template>

  <xsl:template match="@* | comment() | processing-instruction()">
    <xsl:copy/>
  </xsl:template>

  <xsl:template match="text()">
    <xsl:choose>
      <xsl:when test="generate-id() = $text-to-split-id">
        <xsl:variable name="sum">
          <xsl:call-template name="make-sum">
            <xsl:with-param name="nodes" select="preceding::text()"/>
          </xsl:call-template>
        </xsl:variable>
        <xsl:variable name="split-index"
          select="$index - $sum"/>
        <xsl:value-of select="substring(., 1, $split-index - 1)"/>
        <xsl:copy-of select="$new"/>
        <xsl:value-of select="substring(., $split-index)"/>
      </xsl:when>
      <xsl:otherwise>
        <xsl:copy/>
      </xsl:otherwise>
    </xsl:choose>
  </xsl:template>

</xsl:stylesheet>

Note that the XSLT 1.0 solution is not quite complete, it might recurse to a stack overflow if the index passed in is greater than any existing text index in the document.

Martin Honnen
As for the issue with new lines, it is not clear from your problem description whether/how you want to treat new lines or white space in general when computing the position. If you want to ignore all pure white space text nodes then you could put `<xsl:strip-space elements="*"/>` into the stylesheet (and probably additionally `<xsl:output indent="yes"/>` to get indented output). That way pure white space text nodes are stripped before the stylesheet processes the input tree.
Martin Honnen
Rachel, please edit your question to provide a code sample of the input you have and then explain in what way white space is considered part of string you have an index for. I am afraid a one line sample in a comment with some explanation how it really looks is not a good base to find a solution or at least to understand the requirements you have.
Martin Honnen
Martin, thanks for your response. I understood that each new line is considered as the text node and index is getting computed accordingly. I have no issues with your XSL 2.0 solution now. One query, If i have many such index values and nodes to insert in the xml then how do it in a single XSL call. Do i have to invoke the XSL iteratively?
Rachel
Eg. I want to insert <e/> in position 2, <f/> in position 4, <g/> in position 6 and also <h/> in position 6 of the input xml "<a><b>Some text here</b><c>Some other text here</c></a>" how do i do it without calling the XSL multiple times with different inputs?Expected output: <a><b>S<e/>om<f/>e <g/><h/>text here</b><c>Some other text here</c></a>
Rachel
I tried a lot to pass index as an input parameter from the template that processes the text node "<xsl:template match="text()">" so that the complete logic within it can be called iteratively over a list of index values. I could not get it passed to the variable $text-to-split where index is used. I tried in both XSL 2.0 and 1.0. Please let me know how i can pass index as a parameter in your XSL. I plan to have the all the index values as comma separated and iteratively invoke the logic within the template that process the Text nodes in order to achieve the above example.Need your comments.
Rachel
I will add an XSLT 2.0 stylesheet in a separate answer that is able to process a secondary input document with a list of data to be inserted.
Martin Honnen
+1  A: 

The following XSLT 2.0 stylesheet is an attempt to extend the solution in the original XSLT 2.0 stylesheet to receive a list of index positions and nodes to be inserted with one transformation:

<xsl:stylesheet
  xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
  xmlns:xsd="http://www.w3.org/2001/XMLSchema"
  exclude-result-prefixes="xsd"
  version="2.0">

  <xsl:param name="insert-file" as="xsd:string" select="'insert-data.xml'"/>

  <xsl:variable name="main-root" as="document-node()" select="/"/>

  <xsl:variable name="insert-data" as="element(data)*">
    <xsl:for-each-group select="doc($insert-file)/insert-data/data" group-by="xsd:integer(@index)">
      <xsl:sort select="current-grouping-key()"/>
      <data index="{current-grouping-key()}" text-id="{generate-id($main-root/descendant::text()[sum((preceding::text(), .)/string-length(.)) ge current-grouping-key()][1])}">
        <xsl:copy-of select="current-group()/node()"/>
      </data>
    </xsl:for-each-group>
  </xsl:variable>

  <xsl:template match="@* | node()">
    <xsl:copy>
      <xsl:apply-templates select="@*, node()"/>
    </xsl:copy>
  </xsl:template>

  <xsl:template match="text()[generate-id() = $insert-data/@text-id]">
    <xsl:variable name="preceding-text" as="xsd:integer" select="sum(preceding::text()/string-length(.))"/>
    <xsl:variable name="this" as="text()" select="."/>
    <xsl:variable name="insert-here" as="element(data)+">
      <xsl:for-each select="$insert-data[@text-id = generate-id(current())]">
        <data split-index="{@index - $preceding-text}">
          <xsl:copy-of select="node()"/>
        </data>
      </xsl:for-each>
    </xsl:variable>

    <xsl:for-each select="$insert-here">
      <xsl:variable name="pos" as="xsd:integer" select="position()"/>
      <xsl:value-of select="substring(
        $this, 
        if ($pos eq 1) then 1 else xsd:integer($insert-here[$pos - 1]/@split-index), 
        if ($pos ne 1) then xsd:integer(@split-index) - xsd:integer($insert-here[$pos - 1]/@split-index) else xsd:integer(@split-index) - 1)"/>
      <xsl:copy-of select="node()"/>
    </xsl:for-each>

    <xsl:value-of select="substring($this, $insert-here[last()]/@split-index)"/>  
  </xsl:template>

</xsl:stylesheet>

This stylesheet expects a file insert-data.xml to contain the data in the following format:

<insert-data>
  <data index="2"><e/></data>
  <data index="4"><f/></data>
  <data index="6"><g/></data>
  <data index="6"><h/></data>
  <data index="18"><i/></data>
</insert-data>

So each 'data' element contains the nodes to be inserted at the position given by the 'index' attribute.

Martin Honnen
Thanks a lot for your reponse. It works. :-)
Rachel
My input to the XSL is an XHTML. The output from the XSL does not have the DOCTYPE declaration. I want to get the DOCTYPE declaration copied from the input to the output. Is that possible with an XSL?
Rachel
The XSLT/XPath data model does not model any DOCTYPE node so you can't copy that information from the input to the output as it is not present. All you can do is create a DOCTYPE node in the output by setting doctype-public and/or doctype-system on the xsl:output element: http://www.w3.org/TR/xslt20/#serialization
Martin Honnen
Thank you for your response.
Rachel