ansaurus

Question

Combining these two Regular Expressions into one

Answer 1

+2 A:

private static readonly Regex _regex = new Regex(
    @"^(?=.*[a-z])(?=.*[A-Z])(?=.*\d).+$", RegexOptions.Compiled);

public static bool IsAlphaAndNumeric(string s)
{
    return _regex.IsMatch(s);
}

If you want to ignore case you could use RegexOptions.Compiled | RegexOptions.IgnoreCase.

Darin Dimitrov 2010-01-27 09:21:40

+1 But OP wants case insensitive it seems.

Amarghosh 2010-01-27 09:24:33

For OP, lookup positive lookahead on this page: http://msdn.microsoft.com/en-us/library/1400241x(VS.85).aspx

Moron 2010-01-27 09:31:42

This regex only matches strings that contain a lowercase AND an uppercase letter...

dionadar 2010-01-27 11:06:03

Also, it will require either RegexOptions.Singleline or not match strings that contain a newline before one of the three required characters (uppercase letter, lowercase letter and number)

dionadar 2010-01-27 11:08:21

If you pass `RegexOptions.IgnoreCase` there's no need to have the `(?=.*[A-Z])` lookahead.

KennyTM 2010-01-27 11:11:00

Answer 2

+2 A:

You could use [a-zA-Z].*[0-9]|[0-9].*[a-zA-Z], but I'd only recommend it if the system you were using only accepted a single regex. I can't imagine this would be more efficient than two simple patterns without alternation.

Anonymous 2010-01-27 09:37:57

Answer 3

+4 A:

@"^(?=.*[a-zA-Z])(?=.*\d)"

 ^  # From the begining of the string
 (?=.*[a-zA-Z]) # look forward for any number of chars followed by a letter, don't advance pointer
 (?=.*\d) # look forward for any number of chars followed by a digit)

Uses two positive lookaheads to ensure it finds one letter, and one number before succeding. You add the ^ to only try looking forward once, from the start of the string. Otherwise, the regexp engine would try to match at every point in the string.

gnarf 2010-01-27 09:47:25

Answer 4

+2 A:

Its not exactly what you want but let say i have more time. Following should work faster than regex.

    static bool IsAlphaAndNumeric(string str) {
        bool hasDigits = false;
        bool  hasLetters=false;

        foreach (char c in str) {
            bool isDigit = char.IsDigit(c);
            bool isLetter = char.IsLetter(c);
            if (!(isDigit | isLetter))
                return false;
            hasDigits |= isDigit;
            hasLetters |= isLetter;
        }
        return hasDigits && hasLetters;
    }

Why its fast let check it out. Following is the test string generator. It generate 1/3 of set completly correct string and 2/3 ad incorrect. In 2/3 1/2 is all alphs and other half is all digits.

    static IEnumerable<string> GenerateTest(int minChars, int maxChars, int setSize) {
        string letters = "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ";
        string numbers = "0123456789";            
        Random rnd = new Random();
        int maxStrLength = maxChars-minChars;
        float probablityOfLetter = 0.0f;
        float probablityInc = 1.0f / setSize;
        for (int i = 0; i < setSize; i++) {
            probablityOfLetter = probablityOfLetter + probablityInc;
            int length = minChars + rnd.Next() % maxStrLength;
            char[] str = new char[length];
            for (int w = 0; w < length; w++) {
                if (probablityOfLetter < rnd.NextDouble())
                    str[w] = letters[rnd.Next() % letters.Length];
                else 
                    str[w] = numbers[rnd.Next() % numbers.Length];                    
            }
            yield return new string(str);
        }
    }

Following is darin two solution. One has compiled and other is noncompiled version.

class DarinDimitrovSolution
{
    const string regExpression = @"^(?=.*[a-z])(?=.*[A-Z])(?=.*\d).+$";
    private static readonly Regex _regex = new Regex(
        regExpression, RegexOptions.Compiled);

    public static bool IsAlphaAndNumeric_1(string s) {
        return _regex.IsMatch(s);
    }
    public static bool IsAlphaAndNumeric_0(string s) {
        return Regex.IsMatch(s, regExpression);
    }

Following is the main of the test loop

    static void Main(string[] args) {

        int minChars = 3;
        int maxChars = 13;
        int testSetSize = 5000;
        DateTime start = DateTime.Now;
        foreach (string testStr in
            GenerateTest(minChars, maxChars, testSetSize)) {
            IsAlphaNumeric(testStr);
        }
        Console.WriteLine("My solution : {0}", (DateTime.Now - start).ToString());

        start = DateTime.Now;
        foreach (string testStr in
            GenerateTest(minChars, maxChars, testSetSize)) {
            DarinDimitrovSolution.IsAlphaAndNumeric_0(testStr);
        }
        Console.WriteLine("DarinDimitrov  1 : {0}", (DateTime.Now - start).ToString());

        start = DateTime.Now;
        foreach (string testStr in
            GenerateTest(minChars, maxChars, testSetSize)) {
            DarinDimitrovSolution.IsAlphaAndNumeric_1(testStr);
        }
        Console.WriteLine("DarinDimitrov(compiled) 2 : {0}", (DateTime.Now - start).ToString());

        Console.ReadKey();
    }

Following is results

My solution : 00:00:00.0170017    (Gold)
DarinDimitrov  1 : 00:00:00.0320032  (Silver medal) 
DarinDimitrov(compiled) 2 : 00:00:00.0440044   (Gold)

So the first solution was the best. Some more result in release mode and following spec

   int minChars = 20;
   int maxChars = 50;
   int testSetSize = 100000;

My solution : 00:00:00.4060406
DarinDimitrov  1 : 00:00:00.7400740
DarinDimitrov(compiled) 2 : 00:00:00.3410341 (now that very fast)

I checked again with RegexOptions.IgnoreCase flag. rest of param same as above

My solution : 00:00:00.4290429 (almost same as before)
DarinDimitrov  1 : 00:00:00.9700970 (it have slowed down )
DarinDimitrov(compiled) 2 : 00:00:00.8440844 ( this as well still fast but look at .3 in last result)

After gnarf mention that there was a problem with my algo it was checking if string only consist of letter and digits so i change it and now it check that string show have atleast one char and one digit.

    static bool IsAlphaNumeric(string str) {
        bool hasDigits = false;
        bool hasLetters = false;

        foreach (char c in str) {
            hasDigits |= char.IsDigit(c);
            hasLetters |= char.IsLetter(c);
            if (hasDigits && hasLetters)
                return true;
        }
        return false;
    }

Results

My solution : 00:00:00.3900390 (Goody Gold Medal)
DarinDimitrov  1 : 00:00:00.9740974 (Bronze Medal)
DarinDimitrov(compiled) 2 : 00:00:00.8230823 (Silver)

Mine is fast by a big factor.

affan 2010-01-27 10:01:30

Any justification why this would be faster than regex?

Amarghosh 2010-01-27 10:26:37

And if it *is* faster, the difference will be trivial. You'd have to be testing millions of strings in a tight loop to make this worth the effort.

Alan Moore 2010-01-27 11:01:02

I have publish performance result. In my answer. Told you got time.

affan 2010-01-27 12:18:07

That's wonderful. Except that Darin's solution isn't even correct - it is searching for uppercase AND lowercase.

Kobi 2010-01-27 12:28:12

hmm haven't notice that. But conclusion is that compiled regex is faster although it require a initial compilation of assembly by .NET framework and should be done in some static constructor. Can any one improve on mine.

affan 2010-01-27 12:44:17

Can you please try the regexp I provided as a compiled version in your perf testing? `@"^(?=.*[a-z][A-Z])(?=.*\d)"`

gnarf 2010-01-27 18:56:37

Also - your version checks that all the characters are digits or numbers... The OP is only testing that it has an alpha, and a numeric character... Maybe try adjusting your loop to return true once it finds one of each and it might catch up to the regexp compiled versions (which don't check the whole string, they scan through for a letter, then scan through for a number. The worst test case for this is a string which contains one letter at the very end, and no numbers, it will take the most amount of time in the regexp engine

gnarf 2010-01-27 19:00:59

@gnarf you right my one outperformed. Check results I have updated it.

affan 2010-01-28 06:24:12

Whats the time performance in regards to maintaining that monstrosity?

Andrew Dyster 2010-01-28 10:33:05

@affan - Why keep the inaccurate version(s) around? None of the functions mentioned in the top half of your answer do what the OP wanted, I'd just rewrite/test it using methods that return the correct answers.

gnarf 2010-01-29 02:35:54

Answer 5

+5 A:

For C# with LINQ:

return s.Any(Char.IsDigit) && s.Any(Char.IsLetter);

Kobi 2010-01-27 11:07:42

Please, somebody mark this as the answer!

Benjol 2010-01-27 13:07:14

This will require two full iteration of the string chars in worse case.

affan 2010-01-28 04:31:42

@affan - in the worst case you have to check every character twice; this is true for every possible solution. Whether It happens in one loop or two makes no difference, aside from creating another char iterator - for an in-memory string, this is a tiny overhead at most.

Kobi 2010-01-28 05:18:28

@affan - please read the instructions before you downvote, and check what the original function does. It says "at least one alphabetical character and one digit". You are the one with the wrong code, as @gnarf explained to you.

Kobi 2010-01-28 06:10:36

sorry i try to up it again but it say vote to old.

affan 2010-01-28 06:27:22

You need to edit your answer so that i can give a +2

affan 2010-01-28 06:30:20

If the OP isn't committed to using a regex, this is probably the best suggestion.

Alan Moore 2010-01-28 10:13:59

+1 Although this is a very concise and clean way of doing it, I cannot accept it because I was asking for a regular expression. I still gave you a +1 because of showing the LINQ alternative.

Andreas Grech 2010-01-29 20:13:09

Answer 6

A:

The following is not only faster than the other lookahead constructs, it is also (in my eyes) closer to the requirements:

[a-zA-Z\d]((?<=\d)[^a-zA-Z]*[a-zA-Z]|[^\d]*\d)

On my (admittedly crude test) it runs in about half the time required by the other regex solutions, and has the advantage that it will not care about newlines in the input string. (And if for some reason it should, it is obvious how to include it).

Here is how (and why) it works:

Step 1: It matches a single character (let us call it c) that is a number or a letter.
Step 2: It does a lookbehind to check if c is a number. If so:
Step 2.1: It allows an unlimited number of characters that are not a letter, followed by a single letter. If this matches, we have a number (c) followed by a letter.
Step 2.2: If c is not a number, it must be a letter (otherwise it would not have been matched). In this case we allow an unlimited number of non-digits, followed by a single digit. This would mean we have a letter (c) followed by a number.

dionadar 2010-01-27 11:26:58

Logically this is similar to Anonymous' answer, but more complex. Are you sure this is quick? in case of a fail, wouldn't it test for each and every matching letter? (for example, 600 'X's)

Kobi 2010-01-27 11:40:47

As with @affan's answer, it's extremely unlikely that this would be worth the effort anyway. People worry way too much about regex performance.

Alan Moore 2010-01-27 11:48:38

@Anonymous answer will match any character before the first letter twice if the first branch fails, since the second branch does a backtrack to the very beginning. If you can be reasonably sure that the input string has a letter close to the beginning, it will result in the same performance (and after replacing the dots even with the same meaning). -- also thanks for putting in the missing caret - no idea how i killed that during the posting ;)

dionadar 2010-01-27 13:24:07

Oh, and for worrying about regex performance: I am here for fun, not for bucks ;)

dionadar 2010-01-27 13:26:43

You can avoid the backtracking problem entirely by prepending `^(?>[^A-Za-z0-9]*)` to the regex. With that done, I think the lookbehind wouldn't really be pulling its weight any more. For maximum performance, I'd go with `^(?>[^A-Za-z0-9]*)(?:[a-zA-Z](?>[^0-9]*)[0-9]|[0-9](?>[^A-Za-z]*)[a-zA-Z])`. If I were worried about performance, that is... ;)

Alan Moore 2010-01-27 13:55:45

ansaurus

tags:

views:

answers:

Combining these two Regular Expressions into one

related questions