ansaurus

Question

How to keep the HTML tags specified

Answer 1

A:

Check this out http://sourceforge.net/projects/regexcreator/ . This is very handy gui regex editor.

Gadolin 2010-09-24 09:19:20

thank you,i can run this editor,but i don't know how to create the regex pattern for my issue，my regex is suck.

Zenofo 2010-09-24 09:27:47

Answer 2

+1 A:

You could do this using a negative lookahead:

"<(?!(?:a|/a|img)\\b).*?>"

Rubular

However this has a number of problems and I would recommend instead that you use an HTML parser if you want a robust solution.

For more information see this question:

What HTML parsing libraries do you recommend in Java

Mark Byers 2010-09-24 09:22:20

thanks,i try the pattern `html=html.replaceAll("<(?!(?:a|/a|img)\b).*?>", "");` but nothing to happen

Zenofo 2010-09-24 09:39:53

In Java you need to escape backslashes. I've corercted my post.

Mark Byers 2010-09-24 11:12:33

Answer 3

A:

Hey! Here is your answer:

You can’t parse [X]HTML with regex.

krmby 2010-09-24 09:30:35

Hmm. You can. I agree it's a bad idea though.

Spudley 2010-09-24 09:32:54

Answer 4

A:

Use a proper HTML parser, for example htmlparser, Jericho or the validator.nu HTML parser. Then use the parser’s API, SAX or DOM to pull out the stuff you’re interested in.

If you insist on using regular expressions, you’re almost certain to make some small mistake that will lead to breakage, and possibly to cross-site scripting attacks, depending on what you’re doing with the markup.

ansaurus

tags:

views:

answers:

How to keep the HTML tags specified

related questions