views:

33

answers:

2

Hi, I need to parse the following xml document (which is coming from external web service):

...
<dati>
    <Riconoscimento>
        <IdentificativoPosizione>xxxx</IdentificativoPosizione>
        <OutputRestituiti>xxx</OutputRestituiti>
    </Riconoscimento>
    <![CDATA[text text text]]>
</dati>    
...

The problem is that until there is node "Riconoscimento" simplexml parser fails to read cdata section, if i remove that child, everything is working without problems.

So the main question is: is it a valid xml document, and if it's valid is there some way to access CDATA section with php without manually removing extra childs?

Thanks in advance.

+1  A: 

You can get it like this:

$x = simplexml_load_string('<root><dati>
    <Riconoscimento>
        <IdentificativoPosizione>xxxx</IdentificativoPosizione>
        <OutputRestituiti>xxx</OutputRestituiti>
    </Riconoscimento>
    <![CDATA[text text text]]>
</dati></root>', 'SimpleXMLElement', LIBXML_NOCDATA);

var_dump((string)$x->dati);

Note the LIBXML_NOCDATA parameter to convert the CDATA to a text node.

Greg
Thanks, i was missing "(string)" casting (it wasn't visible in debugger without explicit cast)
Alekc
+1  A: 

First of all: this is a valid XML document (see here).

Definition: CDATA sections may occur anywhere character data may occur; they are used to escape blocks of text containing characters which would otherwise be recognized as markup. CDATA sections begin with the string " <![CDATA[ " and end with the string " ]]> ":

In your case the <data/>-element is a mixed-content element.

$xmlString = <<<XML
<dati>
    <Riconoscimento>
        <IdentificativoPosizione>xxxx</IdentificativoPosizione>
        <OutputRestituiti>xxx</OutputRestituiti>
    </Riconoscimento>
    <![CDATA[text text text]]>
</dati>
XML;
$xml = simplexml_load_string($xmlString);
var_dump((string)$xml);

/*
 * outputs:
 * string(37) "
 *
 *        text text text
 *    "
 */

(there is no need to pass LIBXML_NOCDATA)

Stefan Gehrig