views:

278

answers:

4

Greetings,

I am developing GWT application where user can enter his details in Japanese. But the 'userid' and 'password' should only contain English characters(Latin Alphabet). How to validate Strings for this?

+6  A: 

You can use String#matches() with a bit regex for this. Latin characters are covered by \w.

So this should do:

boolean valid = input.matches("\\w+");

This by the way also covers numbers and the underscore _. Not sure if that harms. Else you can just use [A-Za-z]+ instead.

If you want to cover diacritical characters as well (ä, é, ò, and so on, those are per definition also Latin characters), then you need to normalize them first and get rid of the diacritical marks before matching, simply because there's no (documented) regex which covers diacriticals.

String clean = Normalizer.normalize(input, Form.NFD).replaceAll("\\p{InCombiningDiacriticalMarks}+", "");
boolean valid = clean.matches("\\w+");

Update: there's an undocumented regex in Java which covers diacriticals as well, the \p{L}.

boolean valid = input.matches("\\p{L}+");

Above works at Java 1.6.

BalusC
`\p{L}` is documented: http://java.sun.com/javase/6/docs/api/java/util/regex/Pattern.html#ubc
Joachim Sauer
Drat, you're right. I would swear that I've never seen it before in the API docs for years. Are you the maintainer of the Sun API docs?
BalusC
No, but I've read through that particular JavaDoc page more often than I'd like to admit ;-)
Joachim Sauer
+2  A: 

There might be a better approach, but you could load a collection with whatever you deem to be acceptable characters, and then check each character in the username/password field against that collection.

Pseudo:


foreach (character in username)
{
    if !allowedCharacters.contains(character)
    {
        throw exception
    }
}
Superstringcheese
+2  A: 

For something this simple, I'd use a regular expression.

private static final Pattern p = Pattern.compile("\\p{Alpha}+");

static boolean isValid(String input) {
  Matcher m = p.matcher(input);
  return m.matches();
}

There are other pre-defined classes like \w that might work better.

erickson
+3  A: 
static CharsetEncoder asciiEncoder = Charset.forName("US-ASCII"); // or "ISO-8859-1" for ISO Latin 1

boolean isValid(String input) {    
    return asciiEncoder.canEncode(username);
}

For reference: http://java.sun.com/javase/6/docs/api/java/nio/charset/Charset.html

Jon
I wouldn't use this as this allows the input to contain whitespace and control characters (including U+0000), which almost certainly are not welcome in a username.
Joachim Sauer