tags:

views:

52

answers:

1

I have a bunch of URLs structured like so

<h4 class="classname"><a href="http://some-website.com" onclick="someVaryingJS();" title="Some Title">Some Title</a><h4>

I want to be able to extract just the href and title attributes, keeping in mind the onclick attribute changes per tag and that I only want to do it for anchor tags that are within h4's of that class.

+1  A: 

You could load the html fragment into DOMDocument, and process it from there..?

It's obviously going to be more flexible, but a lot heavier than a straight up regex.

danp