I'm having problem with matching non-HTML tags in text mainly, because tags starts with <
and ends with >
but not <
and >
. So instead <ref>xx</ref>
i have <ref>xxx</ref>
. What I need to do is remove all such tags including their content.
The problem is that some tags may have attributes. I found nice answer here but still there's a problem.
Assuming that I have tag like: <gallery src=sss>xxx</gallery>
this expression suits perfect:
@"<(?<Tag>\w+)[^>)]*>.*?</\k<Tag>>"
Reality is quite different and all special characters are escaped, so tag looks like: <gallery src=sss>xxx</gallery>
. My problem is to match this king of tags. So far I have this expression:
@"\<\;(?<Tag>\w+)[^\&)]*\>\;.*?\<\;/\k<Tag>\>\;"
. It matches tags with no attributes, but not the one mentioned above. What am I missing?
Second issue is matching {| |}
tags, because they can be nested. Can you help me with this as well? This expression doesn't do the job: @"\{\|(?:[^\|\}]|\{\|[^\|\}]*\|\})*\|\}"
Edit: To clarify second issue. I have to match strings that starts with opening tag {|
then goes some text and ends with |}
tags. This structure can be nested, so this: {| xxx {| yyy |} xxx |}
is allowed. I don't know maximum nesting level unfortunately, but lets say that 1 should suit most cases.
Edit 2: This expressions works for my 1st issue @"\<\;(?<Tag>\w+).*?\<\;/\k<Tag>\>\;"
. I have noticed that it fails if there's a new line mark between opening and closing tags.
Edit 3: This do the job with second issue: @"\{\|(?>(?!\{\||\|\}).|\{\|(?<N>)|\|\}(?<-N>))*(?(N)(?!))\|\}"
Regards, Ventus