tags:

views:

436

answers:

4

I want to know how to replace the string in Java.

E.g.

String a = "adf�sdf";

How can I replace and avoid special characters?

A: 

It is hard to answer the question without knowing more of the context.

In general you might have an encoding problem. See The Absolute Minimum Every Software Developer (...) Must Know About Unicode and Character Sets for an overview about character encodings.

DR
A: 

You can use unicode escape sequences (such as \u201c [an opening curly quote]) to "avoid" characters that can't be directly used in your source file encoding (which defaults to the default encoding for your your platform, but you can change it with the -encoding parameter to javac).

T.J. Crowder
source file encoding defaults to the platform default encoding, i.e. usually not UTF-8.
Michael Borgwardt
@Michael: Thanks, fixed. I wasn't just inventing that, I wonder what language/environment it actually related to? ;-) Or was it true in 1996 or something...
T.J. Crowder
I doubt that, since UTF-8 wasn't specified until 1993, and Java instead used to have the recommendation to use native2ascii before distributing source code. I'd expect UTF-8 to be the default in some newer systems, though.
Michael Borgwardt
@Michael: 1993 is earlier than 1996, and I remember it being all nifty and cool that Java supported these weird Unicode things, so it's *possible*, though not likely. ;-) (`native2ascii`, crikey, that's a blast from the past) Thanks, though, the info pre-edit was clearly wrong in 2010 regardless!
T.J. Crowder
+4  A: 

You can get rid of all characters outside the printable ASCII range using String#replaceAll() by replacing the pattern [^\\x20-\\x7e] with an empty string:

a = a.replaceAll("[^\\x20-\\x7e]", "");

But this actually doesn't solve your actual problem. It's more a workaround. With the given information it's hard to nail down the root cause of this problem, but reading either of those articles must help a lot:

BalusC
Hmm, there seems to be a markdown bug (link 2 isn't correctly parsed), but I can't seem to locate/fix it?
BalusC
@BalusC: Happens to me all the time (since I link to the Java6 docs a lot), you want to replace the space near the end with `%20`.
T.J. Crowder
@T.J. yes, that was it, thanks :) BTW: Firefox normally escapes them before pasting, but it didn't happen correctly for some odd reason. I re-created the link and the problem went away.
BalusC
@BalusC: I find very ironic that you point out a Joel article... His first article on Unicode was full of errors and misunderstanding: I remember him posting it and thinking "WTF!?". It was a "ah ah I got it" memorable moment from Joel, that was *full* of errors. It's actually since he posted his first article on Unicode that I started taking *everything* he ever said and keeps saying with a huge grain of salt ;)
Webinator
@Wiz: That was also one of the reasons I wrote another one myself to clarify the one and other more, even in simple terms and with practical examples and solutions. But.. It are really not that *much* errors in Joel's article as you seem to insinuate?
BalusC
The only significant errors I see are (1) he says UTF-8 uses up to six bytes per character (which was true when he wrote the article, but was changed a month later), and (2) he implies that UTF-16 and UCS-2 are equivalent (which was never true).
Alan Moore
A: 

Assuming, that you want to remove all special characters, you can use the character class \p{Cntrl} Then you only need to use the following code:

stringWithSpecialCharcters.replaceAll("\\p{Cntrl}", replacement);
ablaeul
That works if you assume "special characters" means ASCII control characters. In my experience it usually means punctuation, but in this case it's anyone's guess.
Alan Moore