views:

1272

answers:

5

Testing out someone elses code (of course it was ...) ,

I noticed a few JSP pages printing funky non-ascii characters. Taking a dip into the source I found this tidbit.

// remove any periods from first name e.g. Mr. John --> Mr John
firstName = firstName.trim().replace('.','\0');

Does replacing a character in a String with a null character even work in Java? I know that '\0' will terminate a c-string. Would this be the culprit to the funky characters?

Thanks PR

+5  A: 

Does replacing a character in a String with a null character even work in Java?

No.

Would this be the culprit to the funky characters?

Quite likely.

Michael Borgwardt
+2  A: 

I think it should be the case. To erase the character, you should use replace(".", "") instead.

Valentin Rocher
That's a syntax error.
Michael Borgwardt
Oops, didn't tested it. I'm gonna correct it right now.
Valentin Rocher
+4  A: 

Should be probably changed to

firstName = firstName.trim().replaceAll("\\.", "");
Roman
I actually was going to use this to fix it.
praspa
The `replaceAll` is like a sledgehammer here. You just want to replace a char by an empty string. You don't want to replace patterns at all. Just use `replace(".", "")`.
BalusC
+1  A: 

This does cause "funky characters":

System.out.println( "Mr. Foo".trim().replace('.','\0'));

produces:

Mr[] Foo

in my Eclipse console, where the [] is shown as a square box. As others have posted, use String.replace().

Jim Ferrans
+11  A: 

Does replacing a character in a String with a null character even work in Java? I know that '\0' will terminate a c-string.

That depends on how you define what is working. Does it replace all occurrences of the target character with '\0'? Absolutely!

String s = "food".replace('o', '\0');
System.out.println(s.indexOf('\0')); // "1"
System.out.println(s.indexOf('d')); // "3"
System.out.println(s.length()); // "4"
System.out.println(s.hashCode() == 'f'*31*31*31 + 'd'); // "true"

Everything seems to work fine to me! indexOf can find it, it counts as part of the length, and its value for hash code calculation is 0; everything is as specified by the JLS/API.

It DOESN'T work if you expect replacing a character with the null character would somehow remove that character from the string. Of course it doesn't work like that. A null character is still a character!

String s = Character.toString('\0');
System.out.println(s.length()); // "1"
assert s.charAt(0) == 0;

It also DOESN'T work if you expect the null character to terminate a string. It's evident from the snippets above, but it's also clearly specified in JLS (10.9. An Array of Characters is Not a String):

In the Java programming language, unlike C, an array of char is not a String, and neither a String nor an array of char is terminated by '\u0000' (the NUL character).


Would this be the culprit to the funky characters?

Now we're talking about an entirely different thing, i.e. how the string is rendered on screen. Truth is, even "Hello world!" will look funky if you use dingbats font. A unicode string may look funky in one locale but not the other. Even a properly rendered unicode string containing, say, Chinese characters, may still look funky to someone from, say, Greenland.

That said, the null character probably will look funky regardless; usually it's not a character that you want to display. That said, since null character is not the string terminator, Java is more than capable of handling it one way or another.


Now to address what we assume is the intended effect, i.e. remove all period from a string, the simplest solution is to use the replace(CharSequence, CharSequence) overload.

System.out.println("A.E.I.O.U".replace(".", "")); // AEIOU

The replaceAll solution is mentioned here too, but that works with regular expression, which is why you need to escape the dot meta character, and is likely to be slower.

polygenelubricants
Now, that's a nice explanation. And you're using the right approach to replace the stuff as well :)
BalusC
+1: Very nice and very thorough!
Jim Ferrans