tags:

views:

47

answers:

6
+1  Q: 

regex help - php

$data = "<Data>hello</Data>";
preg_match_all("/\<Data\>[.]+\<\/Data\>/", $data, $match);
print_r($match);

This returns:

Array ( [0] => Array ( ) )

So I am guessing that a match is not made?

+2  A: 
preg_match_all("#<Data>.+</Data>#", $data, $match);

If you wanted to use / as the delimiter:

preg_match_all("/<Data>.+<\/Data>/", $data, $match);

The main problem was that a . inside a character class matches a literal period. Also, using a different delimiter eliminates escaping. Note that you don't have to escape < either way. If you want to be able to extract the inner value, use:

preg_match_all("#<Data>(.+)</Data>#", $data, $match);

"hello" will now be in $matches[1] in your example. Note that regex is not suited for parsing XML, so switch to a real parser for anything non-trivial.

Matthew Flaschen
Wow, thanks Matthew. What does the "#" stand for? In all my php regex I have always began and ended with "/" in the past...
Sochin
PHP allows you to use any delimiter at the beginning and end of a regex, which is convenient because it minimizes escaping.
Matthew Flaschen
+2  A: 

You are using the [] and . incorrectly.

Try this :

$data = "<Data>hello</Data>";
preg_match_all("/\<Data\>.+\<\/Data\>/", $data, $match);
print_r($match);

When you use the [] your a defining a list of possible caracter, in your case the caracters you defined where limited to . only. If you want to use the . to define any caracter you have to use it outside of [].

HoLyVieR
+1  A: 

Inside character classes a dot is just a dot.

<?php  

    $data = "<Data>hello</Data>";
    preg_match_all("/\<Data\>.+\<\/Data\>/", $data, $match);
    print_r($match);

?>

Will yield:

Array
(
    [0] => Array
        (
            [0] => <Data>hello</Data>
        )

)
The MYYN
+2  A: 
<?php

$data = "<Data>hello</Data>";
preg_match_all('#<Data>(.+)</Data>#', $data, $match);
print_r($match);

?>

The output: (as seen on ideone.com)

Array
(
    [0] => Array
        (
            [0] => <Data>hello</Data>
        )

    [1] => Array
        (
            [0] => hello
        )

)

[...] is a character class definition. You use (...) to capture.

References


Special note on reluctant matching

Since you're using preg_match_all, it should be noted that you're currently matching greedily. That is, there is only one match in, say, <Data>hello</Data><Data>how are you</Data> (see on ideone.com).

If you want both <Data> elements, then you must use reluctant matching '#<Data>(.+?)</Data>#' (see on ideone.com).

To illustrate:

----A--Z----A----Z----
    ^^^^^^^^^^^^^^
        A.*Z

There is only one A.*Z match in the above input.


Special note on regex to parse HTML/XML

It's a pain. If at all possible, use a proper HTML/XML parser. There are plenty for PHP.

polygenelubricants
+1 for suggesting use of capturing (which is almost certainly the intent).
Chris
A: 

Try this. you dont need the brackets around the .

"/\<Data\>.+\<\/Data\>/"
skyfoot
A: 
/<Data>([^<^>]+)\<\/Data\>/

$data = "<Data>hello</Data>";
preg_match_all("/<Data>([^<^>]+)\<\/Data\>/", $data, $match);

print_r($match);
unigg