views:

39

answers:

1

I have to decode, using Java, HTML strings which contain the following entities: "&#39" and "&apos". I'm using Apache Commons Lang, but it doesn't decode those two entities, so, I'm currently doing as follows, but I'm looking for the fastest way to do what I want.

import org.apache.commons.lang.StringEscapeUtils;

public class StringUtil {

        public static String decodeHTMLString(String s) {
            return StringEscapeUtils.unescapeHtml((s.replace("'", "`").replace("'", "'")));
        }

}

I searched for older questions, but none seems to answer my question.

+2  A: 

Well, i would imagine that part of the problem is that one of your entities is double encoded: "'". That will not be turned into an apostrophe by any decoder.

As for "'", apparently that one is not +technically+ part of the html entity set.

james
mmm... about the first point, you're right, it's double encoded.For the other one, ok, ' it's not part of standard HTML, but I need to translate it and I'd like to know if there are faster ways to translate it than mine
cdarwin