tags:

views:

105

answers:

2

Hi guys, here is the situation. I'm retrieving a page using curl into a variable. So I now have all the HTML in one snug variable. I need to however using code access a certain DIV notes contents actually its like this - there is one div node on the page with the ID of 'image' and its kinda like this:

<html>
  <body>
    ..........
    <div id="image">
      <a href="somelocation">
         <img src="location.jpg"/> <!-- I need to grab the src of this image object -->
      </a>
     </div>
     <div> Other stuff blah blah</div>
  </body>    
</html>

I need to grab the src attribute of an image tag which is nested within a div tag of the id 'image' which is tucked away somewhere on an HTML page.

How do I do this server end considering I'm retrieving this page using curl.

Thanks again.

+4  A: 

Have you considered using an HTML DOM Parser ?

This will handle all the parsing (even of irregular HTML) and the subsequent querying of elements.

(I wouldn't use regexps - HTML isn't regular and not suited to regexp usage. Huge numbers of edge cases exist to trip you up)

Brian Agnew
WOW!!!! Thats EXACTLY the kind of solution I was looking for! Thanks a bunch man! :D
Ali
Glad that worked out for you
Brian Agnew
A: 

Use a regular expression match.

Tom