views:

1070

answers:

8

Hi,

I'm trying to create a regex to verify that a given string only has alpha characters a-z or A-Z. The string can be up to 25 letters long. (I'm not sure if regex can check length of strings)

Examples:
1. "abcdef" = true;
2. "a2bdef" = false;
3. "333" = false;
4. "j" = true;
5. "aaaaaaaaaaaaaaaaaaaaaaaaaa" = false; //26 letters

Here is what I have so far... can't figure out what's wrong with it though

Regex alphaPattern = new Regex("[^a-z]|[^A-Z]");

I would think that would mean that the string could contain only upper or lower case letters from a-z, but when I match it to a string with all letters it returns false...

Also, any suggestions regarding efficiency of using regex vs. other verifying methods would be greatly appreciated.

Thanks,
Matt

+23  A: 
Regex lettersOnly = new Regex("^[a-zA-Z]{1,25}$");
  • ^ means "begin matching at start of string"
  • [a-zA-Z] means "match lower case and upper case letters a-z"
  • {1,25} means "match the previous item (the character class, see above) 1 to 25 times"
  • $ means "only match if cursor is at end of string"
Blixt
You will also need to set the `RegexOptions.MultiLine` to clarify the meaning of the ^ and $ characters. Otherwise, the provided expression appears perfect. +1
Cerebrus
Can't say I agree on that. If it included the multi-line option, it would validate a string with 20 letters, new-line then a bunch of random non-alphabetic characters. Instead, I would advice one to Trim() the string before using it.
Blixt
Also, depending on what 'up to 25' *really* means, that 1 might want to be a 0 - OP did not specify whether string.Empty is valid.
AakashM
You do NOT want the Multiline option here; without it, ^ and $ mean beginning and end (respectively) of the whole string*, which is exactly what's wanted. (*Except $ also matches before a newline at the end of the string, yadda yadda... By Grabthar, I wish we could have a do-over on that one!)
Alan Moore
Just use \A and \z and all this multiline debate is meaningless anyway.
Peter Boughton
+1 for working answer and good explanation.
Lucas Jones
+1  A: 

Do I understand correctly that it can only contain either uppercase or lowercase letters?

new Regex("^([a-z]{1,25}|[A-Z]{1,25})$")

A regular expression seems to be the right thing to use for this case.

By the way, the caret ("^") at the first place inside a character class means "not", so your "[^a-z]|[^A-Z]" would mean "not any lowercase letter, or not any uppercase letter" (disregarding that a-z are not all letters).

Svante
And, since the set described by [^a-z] ("any character except a lowercase ASCII letter") includes uppercase letters, and [^A-Z] includes lowercase letters, [^a-z]|[^A-Z] will match ANY character.
Alan Moore
+3  A: 

The regular expression you are using is an alternation of [^a-z] and [^A-Z]. And the expressions [^…] mean to match any character other than those described in the character set.

So overall your expression means to match either any single character other than a-z or other than A-Z.

But you rather need a regular expression that matches a-zA-Z only:

[a-zA-Z]

And to specify the length of that, anchor the expression with the start (^) and end ($) of the string and describe the length with the {n,m} quantifier, meaning at least n but not more than m repetitions:

^[a-zA-Z]{0,25}$
Gumbo
+1  A: 

There are excellent interactive tools for developing and testing regex expressions:

They're a great help because they tell you right away if your expression works as expected and even allow you to step through and debug.

Jonathan Webb
+4  A: 

The string can be up to 25 letters long. (I'm not sure if regex can check length of strings)

Regexes ceartanly can check length of a string - as can be seen from the answers posted by others.

However, when you are validating a user input (say, a username), I would advise doing that check separately.

The problem is, that regex can only tell you if a string matched it or not. It won't tell why it didn't match. Was the text too long or did it contain unallowed characters - you can't tell. It's far from friendly, when a program says: "The supplied username contained invalid characters or was too long". Instead you should provide separate error messages for different situations.

Rene Saarsoo
Agreed. So, for me that would not yield 2, but 3 possible messages: "The supplied username contained invalid characters (only .. are allowed)", "The supplied username was too long (maximum of .. is allowed)", or "The supplied username contained invalid characters (only .. are allowed) and was too long (maximum of .. is allowed)". I strongly dislike input validation that gives no clues about what the acceptable input would be, and just gets you running from one error into another...
Arjan
+3  A: 

I'm trying to create a regex to verify that a given string only has alpha characters a-z or A-Z.

Easily done as many of the others have indicated using what are known as "character classes". Essentially, these allow us to specifiy a range of values to use for matching: (NOTE: for simplification, I am assuming implict ^ and $ anchors which are explained later in this post)

[a-z] Match any single lower-case letter.
ex: a matches, 8 doesn't match

[A-Z] Match any single upper-case letter.
ex: A matches, a doesn't match

[0-9] Match any single digit zero to nine
ex: 8 matches, a doesn't match

[aeiou] Match only on a or e or i or o or u. ex: o matches, z doesn't match

[a-zA-Z] Match any single lower-case OR upper-case letter. ex: A matches, a matches, 3 doesn't match

These can, naturally, be negated as well: [^a-z] Match anything that is NOT an lower-case letter ex: 5 matches, A matches, a doesn't match

[^A-Z] Match anything that is NOT an upper-case letter ex: 5 matches, A doesn't matche, a matches

[^0-9] Match anything that is NOT a number ex: 5 doesn't match, A matches, a matches

[^Aa69] Match anything as long as it is not A or a or 6 or 9 ex: 5 matches, A doesn't match, a doesn't match, 3 matches

To see some common character classes, go to: http://www.regular-expressions.info/reference.html

The string can be up to 25 letters long. (I'm not sure if regex can check length of strings)

You can absolutely check "length" but not in the way you might imagine. We measure repetition, NOT length strictly speaking using {}:

a{2} Match two a's together.
ex: a doesn't match, aa matches, aca doesn't match

4{3} Match three 4's together. ex: 4 doesn't match, 44 doesn't match, 444 matches, 4434 doesn't match

Repetition has values we can set to have lower and upper limits:

a{2,} Match on two or more a's together. ex: a doesn't match, aa matches, aaa matches, aba doesn't match, aaaaaaaaa matches

a{2,5} Match on two to five a's together. ex: a doesn't match, aa matches, aaa matches, aba doesn't match, aaaaaaaaa doesn't match

Repetition extends to character classes, so: [a-z]{5} Match any five lower-case characters together. ex: bubba matches, Bubba doesn't match, BUBBA doesn't match, asdjo matches

[A-Z]{2,5} Match two to five upper-case characters together. ex: bubba doesn't match, Bubba doesn't match, BUBBA matches, BUBBETTE doesn't match

[0-9]{4,8} Match four to eight numbers together. ex: bubba doesn't match, 15835 matches, 44 doesn't match, 3456876353456 doesn't match

[a3g]{2} Match an a OR 3 OR g if they show up twice together. ex: aa matches, ba doesn't match, 33 matches, 38 doesn't match, a3 DOESN'T match

Now let's look at your regex: [^a-z]|[^A-Z] Translation: Match anything as long as it is NOT a lowercase letter OR an upper-case letter.

To fix it so it meets your needs, we would rewrite it like this: Step 1: Remove the negation [a-z]|[A-Z] Translation: Find any lowercase letter OR uppercase letter.

Step 2: While not stricly needed, let's clean up the OR logic a bit [a-zA-Z] Translation: Find any lowercase letter OR uppercase letter. Same as above but now using only a single set of [].

Step 3: Now let's indicate "length" [a-zA-Z]{1,25} Translation: Find any lowercase letter OR uppercase letter repeated one to twenty-five times.

This is where things get funky. You might think you were done here and you may well be depending on the technology you are using.

Strictly speaking the regex [a-zA-Z]{1,25} will match one to twenty-five upper or lower-case letters ANYWHERE on a line:

[a-zA-Z]{1,25} a matches, aZgD matches, BUBBA matches, 243242hello242552 MATCHES

In fact, every example I have given so far will do the same. If that is what you want then you are in good shape but based on your question, I'm guessing you ONLY want one to twenty-five upper or lower-case letters on the entire line. For that we turn to anchors. Anchors allow us to specify those pesky details:

^ beginning of a line
(I know, we just used this for negation earlier, don't get me started)

$ end of a line

We can use them like this:

^a{3} From the beginning of the line match a three times together ex: aaa matches, 123aaa doesn't match, aaa123 matches

a{3}$ Match a three times together at the end of a line ex: aaa matches, 123aaa matches, aaa123 doesn't match

^a{3}$ Match a three times together for the ENTIRE line ex: aaa matches, 123aaa doesn't match, aaa123 doesn't match

Notice that aaa matches in all cases because it has three a's at the beginning and end of the line technically speaking.

So the final, technically correct solution, for finding a "word" that is "up to five characters long" on a line would be:

^[a-zA-Z]{1,25}$

The funky part is that some technologies implicitly put anchors in the regex for you and some don't. You just have to test your regex or read the docs to see if you have implicit anchors.

zainnab
+1  A: 
/// <summary>
/// Checks if string contains only letters a-z and A-Z and should not be more than 25 characters in length
/// </summary>
/// <param name="value">String to be matched</param>
/// <returns>True if matches, false otherwise</returns>
public static bool IsValidString(string value)
{
    string pattern = @"^[a-zA-Z]{1,25}$";
    return Regex.IsMatch(value, pattern);
}
Rashmi Pandit
A: 

Hi everyone.

I need to allow only alphanumeric characters (with uppercase) from 0-25 chars length and no lazy-variable-all-repetition-numeric value.

I've got the first part: Regex.IsMatch(tmpResult, "^[0-9A-Z]{0,25}$"); (that's easy)

111112 - match
AABD333434 - match
55555555 - no match
555 - no match

Could anyone please help me with this?

Oisin C. Vera