ansaurus

Question

How to convert non-supported character to html entity in Java

Answer 1

A:

Try using StringEscapeUtils from apache commons.

Tom 2009-11-19 03:54:43

StringEscapeUtils escape everything non-ASCII (not only what cannot be encoded).

Thilo 2009-11-19 04:00:13

Answer 2

+3 A:

I'm not positive I understand the question, but something like this might help:

import java.nio.charset.CharsetEncoder;

...

  StringBuilder buf = new StringBuilder(c.length());
  CharsetEncoder enc = Charset.forName("gb2312");
  for (int idx = 0; idx < c.length(); ++idx) {
    char ch = c.charAt(idx);
    if (enc.canEncode(ch))
      buf.append(ch);
    else {
      buf.append("&#");
      buf.append((int) ch);
      buf.append(';');
    }
  }
  String result = buf.toString();

This code is not robust, because it doesn't handle characters beyond the Basic Multilingual Plane. But iterating over code points in the String, and using the canEncode(CharSequence) method of the CharsetEncoder, you should be able to handle any character.

erickson 2009-11-19 04:37:44

Thank you. I believe this canEncode() form CharsetEncoder is what I am looking for.

Kevin Yu 2009-12-06 10:19:45

Answer 3

A:

Just use utf-8, and that way there is no reason to use entities. If there is an argument that some clients need gb2312 because they don't understand Unicode, then entities are not much use either, because the numeric entities represent Unicode code points.

Mihai Nita 2009-11-19 07:20:11

Answer 4

A:

Thanks for answers above. I could not find a place to choose the right answer. I agree sylvarking more. Thanks.

Kevin Yu 2009-12-06 10:18:48

ansaurus

tags:

views:

answers:

How to convert non-supported character to html entity in Java

related questions