Java: How to decode HTML character entities in Java like HttpUtility.HtmlDecode?

views:

9184

answers:

+8 Q:

Java: How to decode HTML character entities in Java like HttpUtility.HtmlDecode?

Basically I would like to decode a given Html document, and replace all special chars, such as "&nbsp" -> " ", ">" -> ">".

In .NET we can make use of HttpUtility.HtmlDecode.

What's the equivalent function in Java?

I've never used this but found Entity Strip/Insert.

eed3si9n 2009-06-15 02:42:13

+16 A:

I have used the Apache Commons StringEscapeUtils.unescapeHTML() for this:

Unescapes a string containing entity escapes to a string containing the actual Unicode characters corresponding to the escapes. Supports HTML 4.0 entities.

Kevin Hakanson 2009-06-15 02:43:39

Sadly I just realized today that it does not decode HTMLspecial characters very well :(

Siddharth Iyer 2010-10-13 20:04:56

I have used the Apache Commons StringEscapeUtils.unescapeHTML() also escapes already present html. For example: <br/> becones <br> Is there any way to prevent this?

Peter 2010-01-26 09:05:15