tags:

views:

9184

answers:

3

Basically I would like to decode a given Html document, and replace all special chars, such as "&nbsp" -> " ", ">" -> ">".

In .NET we can make use of HttpUtility.HtmlDecode.

What's the equivalent function in Java?

A: 

I've never used this but found Entity Strip/Insert.

eed3si9n
+16  A: 

I have used the Apache Commons StringEscapeUtils.unescapeHTML() for this:

Unescapes a string containing entity escapes to a string containing the actual Unicode characters corresponding to the escapes. Supports HTML 4.0 entities.

Kevin Hakanson
Sadly I just realized today that it does not decode HTMLspecial characters very well :(
Siddharth Iyer
A: 

I have used the Apache Commons StringEscapeUtils.unescapeHTML() also escapes already present html. For example: <br/> becones &lt;br&gt; Is there any way to prevent this?

Peter