views:

1777

answers:

5

I have a string that contains a character � I haven't been able to replace it correctly.

String.replace("�", "");

doesn't work, does anyone know how to remove/replace the � in the string??

+4  A: 

You are asking to replace the character "�" but for me that is coming through as three characters 'ï', '¿' and '½'. This might be your problem... If you are using Java prior to Java 1.5 then you only get the UCS-2 characters, that is only the first 65K UTF-8 characters. Based on other comments, it is most likely that the character that you are looking for is '�', that is the Unicode replacement character. This is the character that is "used to replace an incoming character whose value is unknown or unrepresentable in Unicode".

Actually, looking at the comment from Kathy, the other issue that you might be having is that javac is not interpreting your .java file as UTF-8, assuming that you are writing it in UTF-8. Try using:

javac -encoding UTF-8 xx.java

Or, modify your source code to do:

String.replaceAll("\uFFFD", "");
Paul Wagland
� is being seen as 1 char
MrThys
For you it might be seen as one character, the rest of us are not so lucky ;-)Please tell us the code point of the character that you are trying to replace.
Paul Wagland
+4  A: 

As others have said, you posted 3 characters instead of one. I suggest you run this little snippet of code to see what's actually in your string:

public static void dumpString(String text)
{
    for (int i=0; i < text.length(); i++)
    {
        System.out.println("U+" + Integer.toString(text.charAt(i), 16) 
                           + " " + text.charAt(i));
    }
}

If you post the results of that, it'll be easier to work out what's going on. (I haven't bothered padding the string - we can do that by inspection...)

Jon Skeet
A: 

Use the unicode escape sequence. First you'll have to find the codepoint for the character you seek to replace (let's just say it is ABCD in hex):

str = str.replaceAll("\uABCD", "");
matt b
+4  A: 

Character issues like this are difficult to diagnose because information is easily lost through misinterpretation of characters via application bugs, misconfiguration, cut'n'paste, etc.

As I (and apparently others) see it, you've pasted three characters:

codepoint   glyph   escaped    windows-1252    info
=======================================================================
U+00ef      ï       \u00ef     ef,             LATIN_1_SUPPLEMENT, LOWERCASE_LETTER
U+00bf      ¿       \u00bf     bf,             LATIN_1_SUPPLEMENT, OTHER_PUNCTUATION
U+00bd      ½       \u00bd     bd,             LATIN_1_SUPPLEMENT, OTHER_NUMBER

To identify the character, download and run the program from this page. Paste your character into the text field and select the glyph mode; paste the report into your question. It'll help people identify the problematic character.

McDowell
+5  A: 

That's the Unicode Replacement Character, \uFFFD. (info)

Something like this should work:

String strImport = "For some reason my �double quotes� were lost.";
strImport = strImport.replaceAll("\uFFFD", "\"");
Gunslinger47