tags:

views:

73

answers:

4

I'm using cURL to grab a page and I want to parse out the title of the post (the actual text shown on the link, not the title attribute of the <a>).

The HTML is like this:

<li class="topic">
    <a title="Permanent Link to Blog Post" rel="bookmark" href="http://www.website.com/blog-post/"&gt;Title of blog post</a>
</li>

I tried using this code:

preg_match('/<\a title=\".*\" rel=\"bookmark\" href=\".*\">.*<\/a>/', $page, $matches);

But it's not working, PHP returns Array ( ) (an empty array).

Can anyone supply me the regex to do this? I've tried online generators but it goes right over my head. Cheers!

+1  A: 

Add parenthesis to your expression:

'/<a title=".*" rel="bookmark" href=".*">(.*)<\/a>/'

Everything between ( ) will be returned in the array.

Edit:

You have to remove all the backspaces before the quotation marks.

Edit2:

Just seen in the documentation for preg_match

If matches is provided, then it is filled with the results of search. $matches[0] will contain the text that matched the full pattern, $matches1 will have the text that matched the first captured parenthesized subpattern, and so on.

You should also test your expression with sample text to make sure that it really does what you want to do.

Felix Kling
A: 

Assuming you want the attribute, you could use:

if (preg_match('/<a\s+[^>]*?\btitle="(.+?)"/', $page, $matches)) {
    echo $matches[1], "\n";
}

Parsing HTML can be tricky, and regular expressions aren't up to the job in the general case. For simple, sane documents, you can get away with it.

Just be aware that you're driving a screw with a hammer.

Greg Bacon
A: 

$str = '<li class="topic"> <a title="Permanent Link to Blog Post" rel="bookmark" href="http://www.website.com/blog-post/"&gt; Title of blog post</a> </li>; `

echo strip_tags( $str ) ;

Gives:

Title of blog post

Cups
A: 

here's another way

$str = <<<A
<li class="topic">
    <a title="Permanent Link to Blog Post" rel="bookmark" href="http://www.website.com/blog-post/"&gt;Title of blog post</a>
</li>
A;
$s = explode("</a>",$str);
foreach ($s as $a=>$b){
    if(strpos($b,"<a title")!==FALSE){
        $b=preg_replace("/.*<a title.*>/ms","",$b);
        print $b;
    }
}

output

$ php test.php
Title of blog post
ghostdog74