tags:

views:

541

answers:

3

I am trying to write a regular expression string match in vb.net. The condition that I am trying to implement is that the string should contain only alphabets and must contain atleast one letter of both lower and upper case. i.e AAA-fail, aaa-fail, aAaA-pass.

The regular expression that I have come up with is "^(([a-z]+[A-Z]+)+|([A-Z]+[a-z]+)+)$"

Can someone suggest some better/simpler regular expression for the same?

+3  A: 

This RegEx will work for you:

^[a-zA-Z]*([A-Z][a-z]|[a-z][A-Z])[a-zA-Z]*$

Explanation: if string must have at least one lowercase and one uppercase letter there is a point where uppercase and lowercase letters are next to each other. This place is matched by ([A-Z][a-z]|[a-z][A-Z]) and it matches both cases: one where uppercase char is first and where it's second, then if you have this criteria met you just could add an arbitrary number of lowercase of uppercase character at any end of the string and it will still match

RaYell
+2  A: 

The regex you created will fail under some conditions, such as "aAb". I think the following will work better for you:

^(?:[a-z]+[A-Z]+|[A-Z]+[a-z]+)(?:[a-zA-Z])*$
Templar
No it will not fail on 'aAb'. I suggest you try running it.
RaYell
My comment was regarding Shail's original regex, not yours. Your solution is correct.
Templar
Oh, sorry then. I noticed it was added few minutes after mine so I thought you are referring to it.
RaYell
Ya you are right .. did not notice it earlier .. it fails for 'aAb'.Hey but one more thing that I was thinking .. what if I want to generalise it a litle bit more, as in say now I want a string which should have a-z A-Z and 0-9 .. and it must have all three and no other character. Do I need to type in all permutation in the regex ?
Shail
Trying to create a single regex to check all conditions starts to get pretty ugly at that point. I would suggest doing three separate checks for the following and if all three succeed, then you know all three character types exist: "^\w*[a-z]\w*$" and "^\w*[A-Z]\w*$" and "^\w*[0-9]\w*$".
Templar
I just had one more question @Templar .. what does "(?:" mean? As in whats the difference between the following two regex ... ^(?:[a-z]+[A-Z]+|[A-Z]+[a-z]+)(?:[a-zA-Z])*$ and ^([a-z]+[A-Z]+|[A-Z]+[a-z]+)([a-zA-Z])*$
Shail
The (?: indicates a non-capturing group. By default, brackets capture the contents into a backreference which you can use to retrieve different parts of the string. Since you aren't doing that, I used the non-capturing group convention. But for the purposes of determining if the string matches the regex, there's no real difference.
Templar
@Templar your three regexes will also match string like `0aA_` (because `\w` matches underscore as well). Personally I don't see any point in using 3 regexes instead of one that is doing the same check as all three together just because it's a bit bigger.
RaYell
A: 

Just for fun, I tried to tackle your problem without using regular expressions.

I have the following method which checks if a string value contains characters that correspond to specified unicode categories (uppercase, lowercase, digit...)

Private Function IsValid(ByVal value As String, _
                         ByVal ParamArray categories As UnicodeCategory()) _
                         As Boolean

    ''//Create a hashset with valid unicode categories
    Dim validSet = New HashSet(Of UnicodeCategory)(categories)

    ''//Group the string value's characters by unicode category
    Dim groupedCharacters = value.GroupBy(Function(c) Char.GetUnicodeCategory(c))

    ''//Get an enumerable of categories contained in the string value
    Dim actualCategories = groupedCharacters.Select(Function(group) group.Key)

    ''//Return true if the actual categories correspond 
    ''//to the array of valid categories
    Return validSet.SetEquals(actualCategories)
End Function

The method can be used this way:

Dim myString As String = "aAbbC"
Dim validString As Boolean = IsValid(myString, _
                                     UnicodeCategory.LowercaseLetter, _
                                     UnicodeCategory.UppercaseLetter)

Using this method, you can allow uppercase, lowercase AND digit characters without changing anything. Just add a third argument to IsValid: UnicodeCategory.DecimalDigitNumber

Meta-Knight