views:

52

answers:

3

Hi all, How can write a regular expression to retrieve values from xml node. Actually the node structure is very big so we can't traverse easly, so i want to read as normal text file and hop i can write a regx to findout the matching elements.

<node1>
 <node2>str</node2>
 <node3>Text</node3>
 <myvalue>Here is the values string..</myvalue>
</node1>

The above is the pattern i want to retrieve values <myvalue></myvalue> but in my xml there are so many other node contains the <myvalue> child. So only way to findout the appropriate node which i want is in the above pattern. the only change in value rest of the node values are same <node2>str</node2>, <node3>Text</node3> are always same.

So how can I write the regx for php?

+2  A: 

Use a XML parser, Regex is not appropriate to do that kind of parsing.

Here's the list of the XML parser you can use :

Here's a simple example with DOM that will find all the myvalue located in the node1.

<?php
    $document = new DOMDocument();
    $document->loadXML(
        '<all>
            <myvalue>Elsewhere</myvalue>
            <node1>
                <node2>str</node2>
                <node3>Text</node3>
                <myvalue>Here is the values string..</myvalue>
            </node1>
        </all>');
    $lst = $document->getElementsByTagName('node1');

    for ($i=0; $i<$lst->length; $i++) {
        $node1= $lst->item($i);
        $myvalue = $node1->getElementsByTagName('myvalue');

        if ($myvalue->length > 0) {
            echo $myvalue->item(0)->textContent;
        }
    }
?>
HoLyVieR
but finding that node is bit difficult task.. that is why i prefer regx
coderex
@coderex It is easier and you sure to get accurate result everytime.
HoLyVieR
@coderex you can use XPath to search through the XML once you parse it, for example with SimpleXML: http://www.tuxradar.com/practicalphp/12/3/3
Fanis
+1  A: 

PHP has a SAX-based XML parser which will let you use a real XML parser without storing an entire DOM tree in memory. XMLReader lets you parse the file without even reading the entire file into memory. Using regex to dig into XML is going to be painful.

Ned Batchelder
A: 

If you insist on using regular expression for this, try

preg_match_all('<myvalue>([\s\S]+)<\/myvalue>', $text, $matches);
cypher
but i need to check this also
coderex
preg_match_all('<node2>str<\/node2><node3>Text<\/node3><myvalue>([\s\S]+)<\/myvalue>' $text, $matches);
coderex
but in xml each node have a new line char , it think, so my try fails in this case. so now i need to remove the space an newlines chars
coderex