How would I write a RegEx to:
Find a match where the first instance of a >
character is before the first instance of a <
character.
(I am looking for bad HTML where the closing >
initially in a line has no opening <
.)
How would I write a RegEx to:
Find a match where the first instance of a >
character is before the first instance of a <
character.
(I am looking for bad HTML where the closing >
initially in a line has no opening <
.)
^[^<>]*>
if you need the corresponding <
as well,
^[^<>]*>[^<]*<
If there is a possibility of tags before the first >
,
^[^<>]*(?:<[^<>]+>[^<>]*)*>
Note that it can give false positives, e.g.
<!-- > -->
is a valid HTML, but the RegEx will complain.
Would this work?
string =~ /^[^<]*>/
This should start at the beginning of the line, look for all characters that aren't an open '<' and then match if it finds a close '>' tag.
It's a pretty bad idea to try to parse html with regex, or even try to detect broken html with a regex.
What happens when there is a linebreak so that the > character is the first character on the line for example (valid html).
You might get some mileage from reading the answers to this question also: http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags