views:

112

answers:

1

Hi all,

I have quite the process that we go through in order to display some e-mail communications in our application. Trying to keep it as general as possible...

-We make a request to a service via XML -Get the XML reply string, send the string to a method to encode any invalid characters as follows:

  public static String convertUTF8(String value) {
  char[] chars = value.toCharArray();
  StringBuffer retVal = new StringBuffer(chars.length);
  for (int i = 0; i < chars.length; i++) {
   char c = chars[i];
   int chVal = (int)c;
            if (chVal > Byte.MAX_VALUE) {   
             retVal.append("&#x").append(Integer.toHexString(chVal)).append(";");   
            } else {   
             retVal.append(c);   
            }  
  }
  return retVal.toString();
 }

We then send that result of a string to another method to remove any other invalid characters:

 public static String removeInvalidCharacters(String inString) 
 { 
     if (inString == null){ 
      return null;
     } 
     StringBuffer newString = new StringBuffer(); 
     char ch; 
     char c[] = inString.toCharArray();
     for (int i = 0; i < c.length; i++) 
     { 
         ch = c[i]; 
         // remove any characters outside the valid UTF-8 range as well as all control characters 
         // except tabs and new lines 
         if ((ch < 0x00FD && ch > 0x001F) || ch == '\t' || ch == '\n' || ch == '\r') 
         { 
             newString.append(ch); 
         } 
     } 
     return newString.toString(); 
 }
  • This string is then "unmarshal'ed" via the SaxParser
  • The object is then sent back to our Display action which generated the response to the calling jsp/javascript to create the page.

The issue is some text can contain characters which can't be processed correctly. The following is eventually rendered on the JSP just fine:

<PrvwCommTxt>This is a new test.  Have a*&amp;#xc7;&amp;#xb4;)&amp;#xa1;.&amp;#xf1;&amp;#xc7;&amp;#xa1;.&amp;#xf1;*&amp;#xc7;&amp;#xb4;)...</PrvwCommTxt>

Which shows up as "This is a new test. Have a*Ç´)¡.ñÇ¡." in the browser.

-The following shows up in a tooltip while hovering over the above text:

<CommDetails>This is a new test.  Have a*Ç´)¡.ñÇ¡.ñ*Ç´)¡.ñ*´)(¡.ñÇ(¡.ñÇ* Wonderful Day!</CommDetails>

This then shows up incorrectly when rendered in the tooltip javascript with all the HEX values and not being rendered correctly.

Any suggestions on how to make the unknown characters show correctly in javascript?

+1  A: 

Get the XML reply string, send the string to a method to encode any invalid characters as follows:

You should be using Apache Commons Lang StringEscapeUtils#escapeXml() for this.

// remove any characters outside the valid UTF-8 range

This makes no sense. There's nothing outside UTF-8 range. The problem lies somewhere else. Get rid of this method.

The issue is some text can contain characters which can't be processed correctly. The following is eventually rendered on the JSP just fine:

You need to set the response encoding to UTF-8 and instruct the webbrowser to use UTF-8. This can be done by putting the following line in top of JSP:

<%@page pageEncoding="UTF-8" %>

See also:

BalusC