ansaurus

Question

Problem trimming Japanese string in java.

Answer 1

+2 A:

From the java docs, it explains why this doesn't work.

If this String object represents an empty character sequence, or the first and last characters of character sequence represented by this String object both have codes greater than '\u0020' (the space character), then a reference to this String object is returned.

You could role your own version easily enough. perhaps the method codePointAt could be used for this purpose.

http://java.sun.com/j2se/1.5.0/docs/api/java/lang/String.html

Paul Whelan 2009-01-26 13:47:41

Answer 2

+2 A:

You'll have to write your own trim() method based on Character.isWhitespace() - unfortunately, trim() does not do what its API doc claims: it strips only ASCII spaces, not any other kind of whitespace.

Michael Borgwardt 2009-01-26 13:47:48

Answer 3

+3 A:

Try the Apache Commons' StringUtils class. The StringUtils.strip() method should work for you.

Mike Sickler 2009-01-26 13:48:32

Answer 4

+4 A:

Have a look at Unicode Normalization and the Normalizer class. The class is new in Java 6, but you'll find an equivalent version in the ICU4J library if you're on an earlier JRE.

 int character = 12288;
 char[] ch = Character.toChars(character);
 String input = new String(ch);
 String normalized = Normalizer.normalize(input, Normalizer.Form.NFKC);

 System.out.println("Hex value:\t" + Integer.toHexString(character));
 System.out.println("Trimmed length           :\t"
   + input.trim().length());
 System.out.println("Normalized trimmed length:\t"
   + normalized.trim().length());

McDowell 2009-01-26 14:13:05

Answer 5

+3 A:

As an alternative to the StringUtils class mentioned by Mike, you can also use a Unicode-aware regular expression, using only Java's own libraries:

"　ユーザー名".replaceAll("\\p{Z}", "")

Or, to really only trim, and not remove whitespace inside the string:

"　ユーザ ー名 ".replaceAll("(^\\p{Z}+|\\p{Z}+$)", "")

Fabian Steeg 2009-01-26 14:25:50

this will also replace the white character in the middle of the string

pablito 2009-01-26 14:32:54

Had just fixed it.

Fabian Steeg 2009-01-26 14:33:53

second choice is good, thanks

pablito 2009-01-26 14:34:00

Be warned that regular expressions are *much* slower than the standard trim() method. If performance is a problem (or becomes one), it would probably be worth your time to write your own trim() without the regex. At the very least, use a compiled Pattern to do the replaceAll().

Michael Myers 2009-01-26 15:40:18

hmm, not generalizable to other languages. I much prefer the ICU4J method.

Phil 2009-01-26 15:53:49

Phil, I'm not sure I understand your comment, why is it not generalizable to other languages? It should work for all Unicode whitespace.

Fabian Steeg 2009-01-26 16:42:34

ansaurus

tags:

views:

answers:

Problem trimming Japanese string in java.

related questions