tags:

views:

42

answers:

1

From the html source I've to identify anchor tag which shouldn't be nested.

For example:

<a href="http://www.abc.com"&gt;abc&lt;a href="http://www.dbc.com"&gt;dbc&lt;/a&gt;

From this on first match it should return

<a href="http://www.abc.com"&gt;abc

On subsequent find

<a href="http://www.dbc.com&gt;dbc&lt;/a&gt;

While finding it should return from open anchor tag to close anchor tag if it is not nested. If it is nested it should return string from open anchor tag to before the beginning of the nested open anchor tag.

Please help. Thanks in advance

+2  A: 

I'd suggest using JTidy. Despite its name it's an HTML parser and will handle all the edge cases that trip up regular expressions (not surprisingly given HTML isn't regular).

Brian Agnew
+1 for "HTML isn't regular"
aioobe
I know HTML isn't regular. But why can't we try it using regex
Sathish
Because regular expressions can only be used reliably with regular constructions! As you've discovered HTML can be formed in a non-regular way, and regular expressions lack the ability to interpret this successfully
Brian Agnew
I accept that regular expressions are valid only on regular constructions. But we try that with predefined cases.
Sathish
Don't re-invent the wheel, you've tagged this question with Jav, so use JTidy to validate your HTML instead
Jon Freedman