ansaurus

Question

Answer 1

A:

You need to delimit your regex; use /<\/div>(.*?)<div class="adsdiv">/ instead.

2010-07-21 10:38:59

Although it doesn't solve the OP's problem, this *is* a valid point. The regex in the question lacks delimiters and will throw an exception if you try to use it.

Alan Moore 2010-07-21 13:26:41

Answer 2

A:

From the PHP Manual:

s (PCRE_DOTALL) - If this modifier is set, a dot metacharacter in the pattern matches all characters, including newlines. Without it, newlines are excluded. This modifier is equivalent to Perl's /s modifier. A negative class such as [^a] always matches a newline character, independent of the setting of this modifier.

So, the following should work:

if (preg_match('~<\/div>(.*?)<div class="adsdiv">~s', $data, $t))

The ~ are there to delimit the regular expression.

Alix Axel 2010-07-21 10:44:59

thank you very much Alix its work fine

normand 2010-07-21 11:08:08

Answer 3

+1 A:

Apart from what has been said above, also add the /s modifier so . will match newlines. (edit: as Alan kindly pointed out, [^<]+ will match newlines anyway)

I always use /U as well since in these cases you normally want minimal matching by default. (will be faster as well). And /i since people say <div>, <DIV>, or even <Div>...

if (preg_match('/<\/div>([^<]+)<div class="adsdiv">/Usi', $data, $match))
{
    echo "Found: ".$match[1]."<br>";
} else {
    echo "Not found<br>";
}

edit made it a little more explicit!

mvds 2010-07-21 10:46:54

thanks mvds for reply but it reply with empty result meaning not work

normand 2010-07-21 11:02:40

Ok I added a little code which shows how to get the matched portion out of it. This should work (although, it requires that the input is *exactly* what you are showing; i.e. not some formatted html by firefox-like "view source"!)

mvds 2010-07-21 12:36:16

`[^<]` will match newlines whether you use the `/s` modifier or not.

Alan Moore 2010-07-21 13:16:55

thanks, updated the answer.

mvds 2010-07-21 13:39:50

And I recommend NOT getting in the habit of using the `/U` modifier. It's better to get *out of* the habit of using `.*`. Reluctant quantifiers speed up matching by avoiding excessive backtracking, but you already took care of that by using `[^<]+` instead of `.*`. If anything, the `/U` is slowing you down, because character-for-character, reluctant quantifiers are slower than greedy ones.

Alan Moore 2010-07-21 14:00:07

Answer 4

+1 A:

Regex aint the right tool for this. Here is how to do it with DOM

$html = <<< HTML
<div class="parent">
    <div>
        <p>previous div<p>
    </div>
    blablabla
    blablabla
    blablabla
    <div class="adsdiv">
        <p>other content</p>
    </div>
</div>
HTML;

Content in an HTML Document is TextNodes. Tags are ElementNodes. Your TextNode with the content of blablabla has to have a parent node. For fetching the TextNode value, we will assume you want all the TextNode of the ParentNode of the div with class attribute of adsdiv

$dom = new DOMDocument;
$dom->loadHTML($html);
$xPath = new DOMXPath($dom);
$nodes = $xPath->query('//div[@class="adsdiv"]');
foreach($nodes as $node) {
    foreach($node->parentNode->childNodes as $child) {
        if($child instanceof DOMText) {
            echo $child->nodeValue;
        }
    };
}

Yes, it's not a funky one liner, but it's also much less of a headache and gives you solid control over the HTML document. Harnessing the Query Power of XPath, we could have shortened the above to

$nodes = $xPath->query('//div[@class="adsdiv"]/../text()');
foreach($nodes as $node) {
    echo $node->nodeValue;
}

I kept it deliberatly verbose to illustrate how to use DOM though.

Gordon 2010-07-21 10:54:19

ansaurus

tags:

views:

answers:

How to get string from HTML with regex?

related questions