tags:

views:

44

answers:

2

I could not think of a proper title. I have some data like -

$data = <<<EOD
<strong>
HHHHH
<strong>
TTTTT
<strong>
RRRRRRR
<strong>
EOD;

Basically above one is just an example. In real, the data is like -

<strong>Some Title</strong>
DATA
<strong>Some other Title</strong>
OTHER DATA

Sample: http://pastebin.com/cxzZWDZ8

Now I apply the following RegEx.

preg_match_all("%<strong>(.*?)<strong>%s", $data, $all);

This matches, HHHHH and RRRRRRR but I want to match TTTTT. How can I do this?

+5  A: 

You could use a lookahead assertion to ensure the <strong> is there, but isn't part of the match (so it can be part of the next match):

</strong>(.*?)(?=<strong>)

However, if what you've got is HTML, you should use an HTML parser to read it and not regex which is infamously poor at parsing HTML/XML markup. With DOMDocument::loadHTML(), getElementsByName and so on you'll have a much more reliable way of scraping page data.

bobince
@Thanks - It works. I always use HTML parser but for this task using HTML parser would result in some wastage of time.
Shubham
A: 

maybe its just a typo but shouldn't your write something like:

preg_match_all("%</strong>(.*?)<strong>%s", $data, $all);

on your first exemple i dont see that you're closing the tags.. but on the "real" exemple you are.. maybe that's it

pleasedontbelong