tags:

views:

236

answers:

5
+4  Q: 

Pandigital Regex?

What's a regular expression that validates if a string is pandigital (containing all digits from 1 to 9 exactly once)?

For example:

123456789
891364572

But not:

11234556789
25896471

I know how to do this without regex but I was unable to form a regex for it.

Thanks.

This is not homework.

+1  A: 

This would be much easier to do in procedural code, looping through each character and marking them off in an array, ... or is this some homework?

nickf
+4  A: 

Regex is not exactly the best tool for the job here, but here you go:

^(?=[^1]*1[^1]*$)(?=[^2]*2[^2]*$)(?=[^3]*3[^3]*$)(?=[^4]*4[^4]*$)(?=[^5]*5[^5]*$)(?=[^6]*6[^6]*$)(?=[^7]*7[^7]*$)(?=[^8]*8[^8]*$)(?=[^9]*9[^9]*$)[1-9]+$

(?= ) is a look-ahead. It does not actually fit the description of regular expressions, as it does not describe a regular language.

MizardX
That looks much more complicated than my existing solution.. perhaps regex isn't the tool I thought it was..
Jason Gin
+1 for the good advice, -1 for making my eyes bleed :-) [No, seriously, just +1]
paxdiablo
+13  A: 

Short and sweet, using a negative lookahead:

/^(?!.*([1-9]).*\1)[1-9]{9}$/
  • [1-9] is the character class for nonzero digits - equivalent to [123456789]
  • .* matches any string of any length.
  • .*([1-9]).*\1.* matches any string with that contains at least two occurrences of the same nonzero digit
    • a nonzero digit is matched and captured by ([1-9])
    • a repeat of that nonzero digit is matched by \1, a back-reference to the first captured match.
    • the .* matches the arbitrary padding before, and between the nonzero digit and its repeat.
  • (?!<pattern>) matches any position where the contained pattern doesn't match. This is a negative lookahead, as it only matches a position in the string, and doesn't consume any of it - just looks ahead to compare it with the contained pattern.
  • [1-9]{9} matches nine nonzeo digits.
    • <pattern>{9} means match the preceding pattern 9 times.
  • ^<pattern>$ matches any string that exactly matches the contained pattern (rather than contains a substring that matches the pattern)
    • ^ matches the position at the beginning of a string OR the beginning of a line
    • $ matches the position at the end of a string OR the end of a line

So combined, we check to make sure that it's not repeating any digits, then we check that it's only digits. Since it's 9 digits long, and none repeat, all must show up exactly once. That's the pigeonhole principle at work!

The syntax for your specific regular expression engine may vary. The above is a PCRE (supported in Perl, Ruby, and a bunch of different other languages). Posix regular expressions have slightly different syntax. Not all engines support negative lookaheads, but most support back-references. Neither are part of the definition of formal theoretic regular expressions, but are very convenient.

rampion
nice +1 for the explanation
ojblass
+1 for a succinct answer with great explanation, but I'd still stick with the non-regex approach for readability.
nickf
+2  A: 

If it's not homework, you shouldn't be using REs. The following C code should be a good start.

#include <stdio.h>
int isPandigital (char *inputStr) {
    /* Initial used states of false. */
    char used[] = {0,0,0,0,0,0,0,0,0,0};

    int count = 0;
    char ch;

    /* Process each character in string. */

    while ((ch = *inputStr++) != '\0') {
        /* Non-numeric, bad. */
        if ((ch < '0') || (ch > '9')) {
            return 0;
        }

        /* Too many, bad. */
        if (++count > 9) {
            return 0;
        }

        /* Multiples, bad. */
        if (used[ch - '0']) {
            return 0;
        }

        /* Store fact that this one's been used. */
        used[ch - '0'] = 1;
    }

    /* Good or bad depending on how many we've had. */
    return (count == 9);
}

 

int main (int argCount, char *argVar[]) {
    int i;
    for (i = 1; i < argCount; i++) {
        if (isPandigital (argVar[i])) {
            printf ("'%s' is pandigital\n", argVar[i]);
        } else {
            printf ("'%s' is NOT pandigital\n", argVar[i]);
        }
    }
    return 0;
}

Using your test data:

$ pandigital 123456789 891364572 11234556789 25896471

we get the following results:

'123456789' is pandigital
'891364572' is pandigital
'11234556789' is NOT pandigital
'25896471' is NOT pandigital
paxdiablo
what's wrong with REs?
rampion
There's nothing wrong with REs, they're just not suitable to *every* task.
paxdiablo
I don't know... rampion's solution looks pretty elegant if it actually works. I'm not familiar with regexps, so I couldn't say. But his breakdown of the expression seems to make sense.
Calvin
Hmm much longer than my existing solution, and I don't know C so I can't test it out.
Jason Gin
A: