views:

38

answers:

3

I am working on a project that pulls data from JMS queue using PHP and Zend Framework. The HTTP client response is below. All I need is the XML string.

I came up with /(.*)<\/RequestDetails>/gs which tests ok on http://gskinner.com/RegExr/ but the preg_match call is returning an empty matches array.

I'm going to continue to hunt around for a pattern, but thought I would post here as well.

Thanks to all who read, etc...

Steve

UPDATE: I can't get the code to paste correctly. Here's a link to a pastbin: http://pastebin.com/rQxzcfSg

A: 

I'd say, why bother with complex Regexes when PHP 5 comes with on-board tools like SimpleXML?

$xml = simplexml_load_string($string); 

print_r($xml); // should output complete tree for you to walk through easily

You'd just have to remove the MIME parts and submit only the raw XML to the function, of course.

More on SimpleXML here.

Pekka
Agree, but the problem is that the HTTP response includes more data than just the XML that I need. I need to pull the XML out of the response and then load the string into SimpleXML
spdaly
@spdaly okay. As long as the XML is clearly defined, a regex may be the way to go here.
Pekka
+1  A: 

The following snippet:

<?php

$text = <<<EOT

blah blah <0>
<RequestDetails><1><2><3>test</RequestDetails>
<RequestDetails><4><5><6>blah
more blah blah
</RequestDetails>
blah blah <7>


EOT;

print $text;

preg_match_all('/<RequestDetails>(.*?)<\/RequestDetails>/s', $text, $matches);

print_r($matches);

?>

Generates this output:

blah blah <0>
<RequestDetails><1><2><3>test</RequestDetails>
<RequestDetails><4><5><6>blah
more blah blah
</RequestDetails>
blah blah <7>

Array
(
    [0] => Array
        (
            [0] => <RequestDetails><1><2><3>test</RequestDetails>
            [1] => <RequestDetails><4><5><6>blah
more blah blah
</RequestDetails>
        )

    [1] => Array
        (
            [0] => <1><2><3>test
            [1] => <4><5><6>blah
more blah blah

        )

)

I've used preg_match_all instead of /g flag, and also used (.*?) reluctant matching, which is really what you want to get multiple matches.

To see why it makes a difference, in the following text, there are two A.*?Z matches, but only one A.*Z.

 ---A--Z---A--Z----
    ^^^^^^^^^^^
       A.*Z

That said, parsing XML using regex is ill-advised. Use a proper XML parser; it'll make your life much easier.

polygenelubricants
A: 

Your g is invalid. Use m instead (for multiline). Test /(.*)<\/RequestDetails>/gs and /(.*)<\/RequestDetails>/ms using this tester.

Blair McMillan