tags:

views:

48

answers:

1

I would like to get the links from the search results. Can someone please help with with the regular expression to do this? I've got this, and it doesn't work:

 preg_match_all("/<h3(.*)><a href=\"(.*)\"(.*)<\/h3>/", $result, $matches);
+1  A: 

Your patterns are likely having the biggest issues because of the greedy vs lazy nature of it. Changing it to the following should solve that issue...

preg_match_all('#<h3.*?><a href="(.*?)".*?</h3>#', $result, $matches);
print_r($matches[1]);

There are possibly a few rare URLs that could mess the pattern up, but chances are you won't run into one. I will point out that stillstanding has a good point though using the API would be a better option.

As for people that blanket answer with "You can't parse HTML with Regex, use a DOM"... Whilst you cannot create a generic HTML parser (and should be using DOM for that task), you can match patterns in a set of text you know follows a certain structure, the fact that structure is HTML is irrelevant. Yes, if Google change their layout it will probably break, but this is also probably true of a DOM Parser. (P.S. I'm well aware this will probably get down-voted by the sheeple).

Cags
+1 for fighting back against the regex haters! Sometimes it is appropriate to use regex on HTML, if you're not trying to parse the DOM completely.
JGB146