tags:

views:

75

answers:

4

Im working with PHP5, and I need to transform XML in the following form:

<item>
    <string isNewLine="1" lineNumber="32">some text in new line</string>
    <string>, more text</string>
    <item>
        <string isNewLine="1" lineNumber="33">some text in new line</string>
        <string isNewLine="1" lineNumber="34">some text</string>
        <string> in the same line</string>
        <string isNewLine="1" lineNumber="35">some text in new line</string>
    </item>
</item>

into something like this:

<item>
    <line lineNumber="32">some text in new line, more text</string>
    <item>
            <line lineNumber="33">some text in new line</string>
            <line lineNumber="34">some text in the same line</string>
            <line lineNumber="35">some text in new line</string>
    </item>
</item>

As you can see, it has joined the text contained in across multiple 'string' nodes. Also note that the 'string' nodes can be nested within other nodes at any level.

What are possible solutions for transforming source xml to the target xml?

Thanks,

A: 

You should look into an XML Parser for this. You could use either a SAX-based or DOM-based parser.

SAX is more efficient but DOM may suit your needs better as it's easier to work with.

irishbuzz
+3  A: 

This stylesheet produces the output you are looking for:

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
    <xsl:output indent="yes" />

    <!--Identity template simply copies content forward by default -->
    <xsl:template match="@*|node()">
        <xsl:copy>
            <xsl:apply-templates select="@*|node()"/>
        </xsl:copy>
    </xsl:template>

    <xsl:template match="string[@isNewLine and @lineNumber]">
        <line>
            <xsl:apply-templates select="@*"/>
            <xsl:apply-templates select="text()" />
            <!-- Include the text() from the string elements that come after this element,
                do not have @isNewLine or @lineNumber,
                and are only following this particular element -->
            <xsl:apply-templates select="following-sibling::string[not(@isNewLine and @lineNumber) and generate-id(preceding-sibling::string[1]) = generate-id(current())]/text()" />
        </line>
    </xsl:template>

    <!--Suppress the string elements that do not contain isNewLine or lineNumber attributes in normal processing-->
    <xsl:template match="string[not(@isNewLine and @lineNumber)]" />

    <!--Empty template to prevent attribute from being copied to output-->
    <xsl:template match="@isNewLine" />

</xsl:stylesheet>
Mads Hansen
+1 You should copy all but the `isNewLine` attribute.
Tomalak
@Mads Hansen: +1 For correct pattern. Minor edit: `string` to `line` transformation, strip `@isNewLine` and reduce predicate.
Alejandro
Thanks @Tomalak, good catch. @Alejandro, I added an empty template for @isNewLine. Feels more clean to me to have the empty template rather than excluding in the predicate filter(6 of one, half dozen of another).
Mads Hansen
@Mads: Also, there is another error: `following-sibling::string[…]` should be `following-sibling::string[…][generate-id(preceding-sibling::string[1]) = generate-id(current())]` for obvious reasons. Depending on the input, a `following-sibling::string[1][…]` could also be enough, but this is less clean so the OP would have to decide this.
Tomalak
Thanks again, @Tomalak. I've updated the answer.
Mads Hansen
@Mads Hansen: I think that your empty template is better. But you've missed again the `string` to `line` transformation.
Alejandro
@Tomalak: Very good point! I missed that, too.
Alejandro
@Mads Hansen: What do you think about ussing keys? As: `<xsl:key name="kStringByPreceding" match="string[not(@isNewLine)]" use="generate-id(preceding-sibling::string[@isNewLine][1])"/>`
Alejandro
@Alejandro: Elaborate, but maybe to much for the OP if he was never exposed to XSLT. ;-) I guess I would do it that way.
Tomalak
@Mads-Hansen: This solution produces wron results. @Tomalak and @Alejandro: seems you haven't run the transformation and seen what it is producing? Cases like this are the main reason I always provide the result in my solutions. For a correct solution, see my answer.
Dimitre Novatchev
@Alejandro - wasn't seeing the conversion of `string` to `line` (updated the answer to correct it). Yes, keys would be more efficient. Unless the input XML is large, might not be necessary. Luckily, @Dimitre Novatchev has provided a nice solution that uses keys.
Mads Hansen
@Mads-Hansen: Your solution had a more serious problem than just the "string" <--> "line" problems. Good for correcting it. Please, in the future check you results and provide them in your answer.
Dimitre Novatchev
@Dimitre: Tomalak've spotted that problem (applying templates to **every** following `string[not(@isNewLine)]`) in his 3 hours ago comment. That's why I've suggested the key.
Alejandro
+3  A: 

Here is an efficient and correct solution:

<xsl:stylesheet version="1.0"
 xmlns:xsl="http://www.w3.org/1999/XSL/Transform"&gt;
 <xsl:output omit-xml-declaration="yes" indent="yes"/>
 <xsl:strip-space elements="*"/>

 <xsl:key name="knextStrings"
   match="string[not(@isNewLine)]"
   use="generate-id(preceding-sibling::string
                                 [@isNewLine][1]
                    )"/>


 <xsl:template match="node()|@*">
  <xsl:copy>
   <xsl:apply-templates select="node()|@*"/>
  </xsl:copy>
 </xsl:template>

 <xsl:template match="string[@isNewLine]">
  <line>
   <xsl:copy-of select="@*[not(name()='isNewLine')]"/>
   <xsl:copy-of select="text()
                       |
                        key('knextStrings',
                             generate-id()
                             )
                              /text()"/>
  </line>
 </xsl:template>

 <xsl:template match="string[not(@isNewLine)]"/>
</xsl:stylesheet>

when this transformation is applied on the originally provided XML document:

<item>
    <string isNewLine="1" lineNumber="32">some text in new line</string>
    <string>, more text</string>
    <item>
        <string isNewLine="1" lineNumber="33">some text in new line</string>
        <string isNewLine="1" lineNumber="34">some text</string>
        <string> in the same line</string>
        <string isNewLine="1" lineNumber="35">some text in new line</string>
    </item>
</item>

the wanted, correct result is produced:

<item>
  <line lineNumber="32">some text in new line, more text</line>
  <item>
    <line lineNumber="33">some text in new line</line>
    <line lineNumber="34">some text in the same line</line>
    <line lineNumber="35">some text in new line</line>
  </item>
</item>
Dimitre Novatchev
@Dimitre: +1 for correct output and because this is what I was thinking ;)
Alejandro
+1 For an efficient and correct solution.
Mads Hansen
@Dimitre: This solution works of the box. Would be great if you could add comments to the code.
Benjamin Ortuzar
A: 

Use an XSL Transformation.

From the PHP documentation:

<?php

$xml = new DOMDocument;
$xml->load('data.xml');

$xsl = new DOMDocument;
$xsl->load('trans.xsl');

$proc = new XSLTProcessor;
$proc->importStyleSheet($xsl);

echo $proc->transformToXML($xml);

?>

Use Dimitri's answer for trans.xsl.

dolmen