views:

73

answers:

3

Given a java.lang.String instance, I want to verify that it doesn't contain any unicode characters that are not ASCII alphanumerics. e.g. The string should be limited to [A-Za-z0-9.]. What I'm doing now is something very inefficient:

import org.apache.commons.lang.CharUtils;

String s = ...;
char[] ch = s.toCharArray();
for( int i=0; i<ch.length; i++)
{
    if( ! CharUtils.isAsciiAlphanumeric( ch[ i ] )
        throw new InvalidInput( ch[i] + " is invalid" );
}

Is there a better way to solve this ?

+3  A: 

You can use

input.matches("[A-Za-z0-9.]+")
Bozho
No, that should be `!input.matches("[^A-Za-z0-9.]")`.
Christoffer Hammarström
depends on your if/else structure..
Bozho
`input.matches("[A-Za-z0-9.]")` means the string is exactly one character long. The regex should be `[A-Za-z0-9.]+`, or `[A-Za-z0-9.]*` if a zero-length string is allowed.
Alan Moore
thanks, correct.
Bozho
+1  A: 

Yes, there's a better way to solve that. You already have written the pattern, so why don't you use a regular expression to validate it? Instead of throwing an exception that includes the invalid character you could just aswell use a generic error message saying something along the lines of "input contains invalid characters (valid characters are a-z and 0-9)".

klausbyskov
A: 

Try this:

private boolean isBasicLatin(String input)
{
    for (char c : input.toCharArray())
    {
        if (!UnicodeBlock.BASIC_LATIN.equals(UnicodeBlock.of(c)))
        {
            return false;
        }
    }

    return true;
}
Samuel Yung