ansaurus

Question

Answer 1

A:

You can't use Regular expressions to parse HTML (or XML for that matter).

Williham Totland 2010-04-28 11:41:47

He's not trying to parse it, he's trying to get rid of it.I happen to agree with you -- the OP is going to run into the same kind of trouble -- but that link doesn't effectively make that point.

Etaoin 2010-04-28 11:44:42

Beat me to it :D

James Westgate 2010-04-28 11:48:22

Depends on the situation. If the OP just has to clean out a few HTML files in a text editor, a simple regex or two may do the job just fine.

Jan Goyvaerts 2010-05-02 03:27:16

Answer 2

A:

People generally advise the use of a parser instead of regex when dealing with HTML.

In case you have to use a regex :) you could use-

<style>.*?</style>

Jordan Stewart 2010-04-28 11:46:56

Answer 3

+1 A:

Your regex does not take into account that comments can contain > characters that do not terminated the comment. Try this regex:

<!--.*?-->|<[^>]*>

You'll have to turn on the option to make . match line breaks. How to do that depends on the application or programming language you're using this regex with. E.g. in Perl you'd use the /s flag. In .NET you'd set RegexOptions.SingleLine.

Jan Goyvaerts 2010-05-02 03:23:08

*Your* regex doesn't take into account that attribute values of HTML tags can contain '>', as in `<img alt="<enter text here>">`

Williham Totland 2010-05-02 09:02:51

My answer only explains why Aleksandar's regex doesn't do what he expects and only provides a solution for that specific problem on his specific example. There are a lot of things my regex doesn't take into account. If MS Word did not put its `<style>` tags inside comments then my regex would have the same problem as Aleksandar's. If you want to take everything into account, then you need a full HTML parser and knowledge about the meaning of specific tags (e.g. `<style>` and `<script>` tags do not contain dipslayable content).

Jan Goyvaerts 2010-05-03 06:19:04

ansaurus

tags:

views:

answers:

RegEx strip html tags problem

related questions