I need to write regular expression that will match any pair of tags <(.*?)>.*?</\1>
but only if there is no other pair tags between them. Tag names are variable length.
views:
83answers:
3
A:
So long as you maintain the "Only if there is no other tags between them" this is easy.
<\s*([^>]+?)\s*>[^<]*</\s*\1\s*>
Shirik
2010-07-05 04:01:56
The OP said "no other pair tags between them".
amphetamachine
2010-07-05 04:03:00
I made the assumption that what is being supplied is valid markup. This regex would only work with that assumption holding true. "Pair of tags" and "tags" are equivalent in such a case.
Shirik
2010-07-05 04:04:02
Not exactly. `<br/>`, `<img/>` and other similar tags do not operate with pairs.
Amber
2010-07-05 04:09:51
It also assumes the content has no comments (which aren't technically tags, IIRC), no CDATA sections, and the tags have no attributes, right? Any other assumptions that might cause problems in the context of this question?
Ken
2010-07-05 04:40:59
And how would one guarantee the assumed invariant that "there are no other [pair] tags between them? With a regexp? It's elephants all the way down.
msw
2010-07-05 04:56:37
A:
You can easily exclude nested tags by excluding the angle bracket needed to open them:
<([^<>]+)>[^<]*</\1>
This regex won't work if the opening tag has attributes. If you want to allow those, try this:
<(\S+)[^<>]*>[^<]*</\1>
Jan Goyvaerts
2010-07-10 01:50:37
You've got a `]` instead of `)` in your first expression. Also, if this isn't HTML, there's no guarantee that `\w+` matches the tagname - makes more sense to use `\S+` in the group, which would match tagnames `<like:this>` `<or-this-one>` or similar.
Peter Boughton
2010-07-10 02:12:12
Thanks to kiamlaluno for fixing my typo. No regex can handle every possible situation. Its hard to be more specific unless nick posts a link to a sample file.
Jan Goyvaerts
2010-07-10 03:29:07
A:
You simply should not do this with regex. However, don't take my word for it.
George Marian
2010-07-10 02:08:21
Who says nick is parsing HTML? Maybe he's just fixing up something with a text editor that happens to support regular expressions but doesn't have a built-in HTML parser.
Jan Goyvaerts
2010-07-10 03:30:57
@Jan Goyvaerts You make a fair point. Nick needs to provide more detail in his question.
George Marian
2010-07-10 03:37:35