ansaurus

Question

regex: matching phrases without a > or white space

Answer 1

+3 A:

Don't use regular expressions to parse HTML. It's a really bad idea and, at best, your code will be flaky. Whatever your language/platform is you'll have a fully-functional HTML parser available. Just use that.

There is no way a regular expression can correctly handle all the cases of escaping, entity use and so on.

cletus 2009-04-27 10:16:31

Answer 2

+3 A:

The HTML parsing has been discussed a lot. Refer to this post:

Using regular expressions to parse HTML: why not?

Jérôme 2009-04-27 10:16:37

Answer 3

+1 A:

Can I refer you to my answer to another similar question ?

Brian Agnew 2009-04-27 10:18:11

Answer 4

+1 A:

Asked the question to soon, just worked out this:

pattern = @"^\s*((?!\s)[^<]+)";

Thanks for the feedback about regex and html, I'll bare it in mind for the future. I'm writing a utility program to make a few pages multi-language (i.e: add asp:literals for hardcoded text etc), I think regex is sufficient for this purpose but if there are better tools please let me know (web stuff isn't my area...).

Patrick 2009-04-27 10:27:09

ansaurus

tags:

views:

answers:

regex: matching phrases without a > or white space

related questions