tags:

views:

48

answers:

3

How would I write a RegEx to:

Find a match where the first instance of a > character is before the first instance of a < character.

(I am looking for bad HTML where the closing > initially in a line has no opening <.)

A: 
^[^<>]*>

if you need the corresponding < as well,

^[^<>]*>[^<]*<

If there is a possibility of tags before the first >,

^[^<>]*(?:<[^<>]+>[^<>]*)*>

Note that it can give false positives, e.g.

<!-- > -->

is a valid HTML, but the RegEx will complain.

KennyTM
It seems as if this won't catch this a line with: <tag1> badtag2>
Mark Wilkins
+1  A: 

Would this work?

string =~ /^[^<]*>/

This should start at the beginning of the line, look for all characters that aren't an open '<' and then match if it finds a close '>' tag.

spig
what happens if the > was a closing one from the line above?
Alexander Kjäll
I think that's a problem with the question. This will do what he asked it to do. To get the previous lines opens up the can-of-worms with using a regular expression to check a non-regular language.
spig
In perl/ruby and other languages you can use the "m" modifier which will treat the entire string as one line regardless of line breaks. I re-read his question and he doesn't necessarily specify that it would be all on one line. `string =~ /^[^<]*>/m`
spig
+1  A: 

It's a pretty bad idea to try to parse html with regex, or even try to detect broken html with a regex.

What happens when there is a linebreak so that the > character is the first character on the line for example (valid html).

You might get some mileage from reading the answers to this question also: http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags

Alexander Kjäll