tags:

views:

85

answers:

4

I've been struggling to figure out how to best do this regular expression.

Here are my requirements:

  • Up to 8 characters
  • Can only be alphanumeric
  • Can only contain up to three alpha characters [a-z] (zero alpha characters are valid to)

Any ideas would be appreciated.

This is what I've got so far, but it only looks for contiguous letter characters:

^(\d|([A-Za-z])(?!([A-Za-z]{3,}))){0,8}$
+1  A: 
/^(?!(?:\d*[a-z]){4})[a-z0-9]{0,8}$/i

Explanation:

  • [a-z0-9]{0,8} matches up to 8 alphanumerics.
  • Lookahead should be placed before the matching happens.
  • The (?:\d*[a-z]) matches 1 alphabetic anywhere. The {4} make the count to 4. So this disables the regex from matching when 4 alphabetics can be found (i.e. limit the count to ≤3).

It's better not to exploit regex like this. Suppose you use this solution, are you sure you will know what the code is doing when you revisit it 1 year later? A clearer way is just check rule-by-rule, e.g.

if len(theText) <= 8 and theText.isalnum():
   if sum(1 for c in theText if c.isalpha()) <= 3:
      # valid
KennyTM
That will only match exactly 8 characters - don't you want the `{8}` to be `{0,8}` (question says "up to 8 characters")?
psmears
@psm: Yes. Fixed.
KennyTM
+2  A: 

Do you have to do this in exactly one regular expression? It is possible to do that with standard regular expressions, but the regular expression will be rather long and complicated. You can do better with some of the Perl extensions, but depending on what language you're using, they may or may not be supported. The cleanest solution is probably to check whether the string matches:

^[A-Za-z0-9]{0,8}$

but doesn't match:

([A-Za-z].*){4}

i.e. it's an alpha string of up to 8 characters (first regular expression), but doesn't contain 4 or more alpha characters (possibly separated by other characters (second regular expression).

psmears
+1, it's two (simpler) steps instead of one, but easier to read and understand.
FrustratedWithFormsDesigner
I agree that this would be easier, but given my constraints, I'm stuck with having to use one regular expression. Thanks!
beardedd
+2  A: 

I'd write it like this:

^(?=[a-z0-9]{0,8}$)(?:\d*[a-z]){0,3}\d*$

It has two parts:

  • (?=[a-z0-9]{0,8}$)
    • Looksahead and matches up to 8 alphanumeric to the end of the string
  • (?:\d*[a-z]){0,3}\d*$
    • Essentially allowing injection of up to 3 [a-z] among \d*

Rubular

On rubular.com

12345678    // matches
123456789
@(#*@$
12345       // matches
abc12345
abcd1234
12a34b5c    // matches
12ab34cd
123a456     // matches

Alternatives

I do think regex is the best solution for this, but since the string is short, it would be a lot more readable to do this in two steps as follows:

  • It must match [a-z0-9]{0,8}
  • Then, delete all \d
    • The length must now be <= 3
polygenelubricants
Thanks, that regular expression did the trick.
beardedd
A: 

The easiest way to do this would be in multiple steps:

  1. Test the string against /^[a-z0-9]{0,8}$/i -- the string is up to 8 characters and only alphanumeric
  2. Make a copy of the string, delete all non-alphabetic characters
  3. See if the resulting string has a length of 3 or less.

If you want to do it in one regular expression, you can use something like:

/^(?=\d*(?:[a-z]?\d*){0,3}$)[a-z0-9]{0,8}$/i

Which looks for a alphanumeric string between length 0 and 8 (^[a-z0-9]{0,8}$), but first uses a lookahead ((?=\d*(?:[a-z]?\d*){0,3}$)) to make sure that the string has at most 3 alphabetic characters.

Daniel Vandersluis