tags:

views:

99

answers:

2

Hello, I have following HTML markup,

<div id="subcontent_l">
<p>
<a href="/membership-packages/"><img height="202" width="644" alt="" src="http://74.52.72.231/wp-content/uploads/2010/06/banner1.jpg" title="banner1" class="aligncenter size-full wp-image-299">
</a>
</p>
<p class="subc">Access to Guaranteed Healthcare Benefits</p>
<p><a href="http://74.52.72.231/join-now"&gt;&lt;img height="37" width="166" alt="" src="http://74.52.72.231/wp-content/uploads/2010/09/jn2.jpg" title="jn" class="alignleft size-full wp-image-229"></a></p>
</div>

Now on the above markup I want to find that anchor which have following image with src=jn2.jpg After finding this my Markup should be like this

Desired result would be:-

<a href="http://74.52.72.231/join-now"&gt;&lt;img height="37" width="166" alt="" src="http://74.52.72.231/wp-content/uploads/2010/09/jn2.jpg" title="jn" class="alignleft size-full wp-image-229"></a>

I want to do this using regular expression, I have a regular expression which find a all the images tag inside . my expression is /[^<]*<a.*href[\s]*=[\s]*("[^"]*").*[\s]*<img.*\/a>$ But not able to find the same what i want. Please help me.

A: 

What Colin Hebert says is true but still there is a regex

preg_match_all('%<a[^>]*href=(\'|")(.+?)\1[^>]*?>.*?<img[^>]*src=(\'|")(.+?)\1[^>]*?>.*?</a>%si', $code, $result, PREG_SET_ORDER);
Spidfire
+6  A: 

Regex is unsuitable for this job. HTML is not a regular language. Rather use a HTML parser. Every self-respected programming language offers HTML parsing facilities and/or libraries. I have no idea what programming language you're using, but if you're familiar with Java, I'd recommend Jsoup for this. Here's an example which does what you want:

String html = "<div id=\"subcontent_l\">"
    + "<p>"
    + "<a href=\"/membership-packages/\"><img height=\"202\" width=\"644\" alt=\"\" src=\"http://74.52.72.231/wp-content/uploads/2010/06/banner1.jpg\" title=\"banner1\" class=\"aligncenter size-full wp-image-299\">"
    + "</a>"
    + "</p>"
    + "<p class=\"subc\">Access to Guaranteed Healthcare Benefits</p>"
    + "<p><a href=\"http://74.52.72.231/join-now\"&gt;&lt;img height=\"37\" width=\"166\" alt=\"\" src=\"http://74.52.72.231/wp-content/uploads/2010/09/jn2.jpg\" title=\"jn\" class=\"alignleft size-full wp-image-229\"></a></p>"
    + "</div>";

Document document = Jsoup.parse(html);
Element link = document.select("img[src$=jn2.jpg]").first().parent();
System.out.println(link.outerHtml()); // Prints the desired result.

Jsoup uses jQuery-like CSS selectors to select elements of interest. For C#/.NET there's a Jsoup port: NSoup. Also PHP has a similar library: phpQuery.

See also:

BalusC