tags:

views:

348

answers:

6

I have the following in C#:

public static bool IsAlphaAndNumeric(string s)
{
    return Regex.IsMatch(s, @"[a-zA-Z]+") 
        && Regex.IsMatch(s, @"\d+");
}

I want to check if parameter s contains at least one alphabetical character and one digit and I wrote the above method to do so.

But is there a way I can combine the two regular expressions ("[a-zA-Z]+" and "\d+") into one ?

+2  A: 
private static readonly Regex _regex = new Regex(
    @"^(?=.*[a-z])(?=.*[A-Z])(?=.*\d).+$", RegexOptions.Compiled);

public static bool IsAlphaAndNumeric(string s)
{
    return _regex.IsMatch(s);
}

If you want to ignore case you could use RegexOptions.Compiled | RegexOptions.IgnoreCase.

Darin Dimitrov
+1 But OP wants case insensitive it seems.
Amarghosh
For OP, lookup positive lookahead on this page: http://msdn.microsoft.com/en-us/library/1400241x(VS.85).aspx
Moron
This regex only matches strings that contain a lowercase AND an uppercase letter...
dionadar
Also, it will require either RegexOptions.Singleline or not match strings that contain a newline before one of the three required characters (uppercase letter, lowercase letter and number)
dionadar
If you pass `RegexOptions.IgnoreCase` there's no need to have the `(?=.*[A-Z])` lookahead.
KennyTM
+2  A: 

You could use [a-zA-Z].*[0-9]|[0-9].*[a-zA-Z], but I'd only recommend it if the system you were using only accepted a single regex. I can't imagine this would be more efficient than two simple patterns without alternation.

Anonymous
+4  A: 
@"^(?=.*[a-zA-Z])(?=.*\d)"

 ^  # From the begining of the string
 (?=.*[a-zA-Z]) # look forward for any number of chars followed by a letter, don't advance pointer
 (?=.*\d) # look forward for any number of chars followed by a digit)

Uses two positive lookaheads to ensure it finds one letter, and one number before succeding. You add the ^ to only try looking forward once, from the start of the string. Otherwise, the regexp engine would try to match at every point in the string.

gnarf
+2  A: 

Its not exactly what you want but let say i have more time. Following should work faster than regex.

    static bool IsAlphaAndNumeric(string str) {
        bool hasDigits = false;
        bool  hasLetters=false;

        foreach (char c in str) {
            bool isDigit = char.IsDigit(c);
            bool isLetter = char.IsLetter(c);
            if (!(isDigit | isLetter))
                return false;
            hasDigits |= isDigit;
            hasLetters |= isLetter;
        }
        return hasDigits && hasLetters;
    }

Why its fast let check it out. Following is the test string generator. It generate 1/3 of set completly correct string and 2/3 ad incorrect. In 2/3 1/2 is all alphs and other half is all digits.

    static IEnumerable<string> GenerateTest(int minChars, int maxChars, int setSize) {
        string letters = "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ";
        string numbers = "0123456789";            
        Random rnd = new Random();
        int maxStrLength = maxChars-minChars;
        float probablityOfLetter = 0.0f;
        float probablityInc = 1.0f / setSize;
        for (int i = 0; i < setSize; i++) {
            probablityOfLetter = probablityOfLetter + probablityInc;
            int length = minChars + rnd.Next() % maxStrLength;
            char[] str = new char[length];
            for (int w = 0; w < length; w++) {
                if (probablityOfLetter < rnd.NextDouble())
                    str[w] = letters[rnd.Next() % letters.Length];
                else 
                    str[w] = numbers[rnd.Next() % numbers.Length];                    
            }
            yield return new string(str);
        }
    }

Following is darin two solution. One has compiled and other is noncompiled version.

class DarinDimitrovSolution
{
    const string regExpression = @"^(?=.*[a-z])(?=.*[A-Z])(?=.*\d).+$";
    private static readonly Regex _regex = new Regex(
        regExpression, RegexOptions.Compiled);

    public static bool IsAlphaAndNumeric_1(string s) {
        return _regex.IsMatch(s);
    }
    public static bool IsAlphaAndNumeric_0(string s) {
        return Regex.IsMatch(s, regExpression);
    }

Following is the main of the test loop

    static void Main(string[] args) {

        int minChars = 3;
        int maxChars = 13;
        int testSetSize = 5000;
        DateTime start = DateTime.Now;
        foreach (string testStr in
            GenerateTest(minChars, maxChars, testSetSize)) {
            IsAlphaNumeric(testStr);
        }
        Console.WriteLine("My solution : {0}", (DateTime.Now - start).ToString());

        start = DateTime.Now;
        foreach (string testStr in
            GenerateTest(minChars, maxChars, testSetSize)) {
            DarinDimitrovSolution.IsAlphaAndNumeric_0(testStr);
        }
        Console.WriteLine("DarinDimitrov  1 : {0}", (DateTime.Now - start).ToString());

        start = DateTime.Now;
        foreach (string testStr in
            GenerateTest(minChars, maxChars, testSetSize)) {
            DarinDimitrovSolution.IsAlphaAndNumeric_1(testStr);
        }
        Console.WriteLine("DarinDimitrov(compiled) 2 : {0}", (DateTime.Now - start).ToString());

        Console.ReadKey();
    }

Following is results

My solution : 00:00:00.0170017    (Gold)
DarinDimitrov  1 : 00:00:00.0320032  (Silver medal) 
DarinDimitrov(compiled) 2 : 00:00:00.0440044   (Gold)

So the first solution was the best. Some more result in release mode and following spec

   int minChars = 20;
   int maxChars = 50;
   int testSetSize = 100000;

My solution : 00:00:00.4060406
DarinDimitrov  1 : 00:00:00.7400740
DarinDimitrov(compiled) 2 : 00:00:00.3410341 (now that very fast)

I checked again with RegexOptions.IgnoreCase flag. rest of param same as above

My solution : 00:00:00.4290429 (almost same as before)
DarinDimitrov  1 : 00:00:00.9700970 (it have slowed down )
DarinDimitrov(compiled) 2 : 00:00:00.8440844 ( this as well still fast but look at .3 in last result)

After gnarf mention that there was a problem with my algo it was checking if string only consist of letter and digits so i change it and now it check that string show have atleast one char and one digit.

    static bool IsAlphaNumeric(string str) {
        bool hasDigits = false;
        bool hasLetters = false;

        foreach (char c in str) {
            hasDigits |= char.IsDigit(c);
            hasLetters |= char.IsLetter(c);
            if (hasDigits && hasLetters)
                return true;
        }
        return false;
    }

Results

My solution : 00:00:00.3900390 (Goody Gold Medal)
DarinDimitrov  1 : 00:00:00.9740974 (Bronze Medal)
DarinDimitrov(compiled) 2 : 00:00:00.8230823 (Silver)

Mine is fast by a big factor.

affan
Any justification why this would be faster than regex?
Amarghosh
And if it *is* faster, the difference will be trivial. You'd have to be testing millions of strings in a tight loop to make this worth the effort.
Alan Moore
I have publish performance result. In my answer. Told you got time.
affan
That's wonderful. Except that Darin's solution isn't even correct - it is searching for uppercase AND lowercase.
Kobi
hmm haven't notice that. But conclusion is that compiled regex is faster although it require a initial compilation of assembly by .NET framework and should be done in some static constructor. Can any one improve on mine.
affan
Can you please try the regexp I provided as a compiled version in your perf testing? `@"^(?=.*[a-z][A-Z])(?=.*\d)"`
gnarf
Also - your version checks that all the characters are digits or numbers... The OP is only testing that it has an alpha, and a numeric character... Maybe try adjusting your loop to return true once it finds one of each and it might catch up to the regexp compiled versions (which don't check the whole string, they scan through for a letter, then scan through for a number. The worst test case for this is a string which contains one letter at the very end, and no numbers, it will take the most amount of time in the regexp engine
gnarf
@gnarf you right my one outperformed. Check results I have updated it.
affan
Whats the time performance in regards to maintaining that monstrosity?
Andrew Dyster
@affan - Why keep the inaccurate version(s) around? None of the functions mentioned in the top half of your answer do what the OP wanted, I'd just rewrite/test it using methods that return the correct answers.
gnarf
+5  A: 

For C# with LINQ:

return s.Any(Char.IsDigit) && s.Any(Char.IsLetter);
Kobi
Please, somebody mark this as the answer!
Benjol
This will require two full iteration of the string chars in worse case.
affan
@affan - in the worst case you have to check every character twice; this is true for every possible solution. Whether It happens in one loop or two makes no difference, aside from creating another char iterator - for an in-memory string, this is a tiny overhead at most.
Kobi
@affan - please read the instructions before you downvote, and check what the original function does. It says "at least one alphabetical character and one digit". You are the one with the wrong code, as @gnarf explained to you.
Kobi
sorry i try to up it again but it say vote to old.
affan
You need to edit your answer so that i can give a +2
affan
If the OP isn't committed to using a regex, this is probably the best suggestion.
Alan Moore
+1 Although this is a very concise and clean way of doing it, I cannot accept it because I was asking for a regular expression. I still gave you a +1 because of showing the LINQ alternative.
Andreas Grech
A: 

The following is not only faster than the other lookahead constructs, it is also (in my eyes) closer to the requirements:

[a-zA-Z\d]((?<=\d)[^a-zA-Z]*[a-zA-Z]|[^\d]*\d)

On my (admittedly crude test) it runs in about half the time required by the other regex solutions, and has the advantage that it will not care about newlines in the input string. (And if for some reason it should, it is obvious how to include it).

Here is how (and why) it works:

Step 1: It matches a single character (let us call it c) that is a number or a letter.
Step 2: It does a lookbehind to check if c is a number. If so:
Step 2.1: It allows an unlimited number of characters that are not a letter, followed by a single letter. If this matches, we have a number (c) followed by a letter.
Step 2.2: If c is not a number, it must be a letter (otherwise it would not have been matched). In this case we allow an unlimited number of non-digits, followed by a single digit. This would mean we have a letter (c) followed by a number.

dionadar
Logically this is similar to Anonymous' answer, but more complex. Are you sure this is quick? in case of a fail, wouldn't it test for each and every matching letter? (for example, 600 'X's)
Kobi
As with @affan's answer, it's extremely unlikely that this would be worth the effort anyway. People worry way too much about regex performance.
Alan Moore
@Anonymous answer will match any character before the first letter twice if the first branch fails, since the second branch does a backtrack to the very beginning. If you can be reasonably sure that the input string has a letter close to the beginning, it will result in the same performance (and after replacing the dots even with the same meaning). -- also thanks for putting in the missing caret - no idea how i killed that during the posting ;)
dionadar
Oh, and for worrying about regex performance: I am here for fun, not for bucks ;)
dionadar
You can avoid the backtracking problem entirely by prepending `^(?>[^A-Za-z0-9]*)` to the regex. With that done, I think the lookbehind wouldn't really be pulling its weight any more. For maximum performance, I'd go with `^(?>[^A-Za-z0-9]*)(?:[a-zA-Z](?>[^0-9]*)[0-9]|[0-9](?>[^A-Za-z]*)[a-zA-Z])`. If I were worried about performance, that is... ;)
Alan Moore