tags:

views:

144

answers:

2

I am reading contains of an HTML page for some details, I'm searching for every occurrence of a string, that string comes withing a tag, I want to read just that string only.

Example:

<a href="http://www.example.com/search?la=en&amp;q=javascript"&gt;javascript&lt;/a&gt;
<a href="http://www.example.com/search?la=en&amp;q=PHP"&gt;PHP&lt;/a&gt;

I just want to read every occurrence of tags TEXT on the basis of href tag which must contain this (http://www.example.com/search?la=en&amp;q=).

Any idea?

+3  A: 

SimpleHtmlDom example (isn't it pretty?):

// Create DOM from URL or file
$html = file_get_html('http://www.google.com/');

// Find all links 
foreach($html->find('a') as $element) {
       echo $element->href . '<br>';
       echo $element->text; //this is what you want
}
karim79
A: 

If the HTML page you're reading is very regular (for instance, machine-generated according to predictable patterns), something like this would work:

preg_match('|<a\s+href="http://www.example.com/search\?la=en&amp;q=(\w+)"\s*&gt;\1&lt;/a&gt;|', $page)

But if it gets any more complicated than that, regular expressions probably won't be enough for the job - you'd be better off using a full HTML parser to extract the links and check them one-by-one to find the text you want.

David Zaslavsky
I believe you should escape the dots in the url?http://www\.example\.com/
Håkon