tags:

views:

157

answers:

4

Possible Duplicate:
help with regex needed

I need a regular expression for which:

the string is alphanumeric and have exactly 6 characters in the first half followed by hyphen(optional) followed by optional 4 characters:(cannot have more than 4 characters in the second half)

so any of the following is valid

11111A
111111-1
111111-yy
yyyyy-989
yyyyyy-9090

I have ^[a-zA-Z0-9]{5}(-[a-zA-Z0-9]{1,3})?$ as the regex expression

what if i want to add another condition stating that the first half cannot have all zeros and also the whole expression cannot have zeros so 00000 or 00000-000 is invalid

+2  A: 

Not sure what you're using for regex, but here's how I did it in Bash.

The "-v" option reverses the meaning of your search, so it functions as a not:

egrep -v "^[0]{5}" filename.txt | egrep "^[a-zA-Z0-9]{5}-[a-zA-Z0-9]{1,4}$"

So essentially, the first half weeds out all the lines with too many zeros, and the second half applies the regex you already had going on what's left over.

Once you check for not 00000 not 00000-00 and variants are also skipped. But if 12345-000 is invalid you could just change things to:

egrep -v "^[0]{5}|-[0]{1,4}$" filename.txt | egrep "^[a-zA-Z0-9]{5}-[a-zA-Z0-9]{1,4}$"

Finally, if like you commented to Harpo you only want to weed out all zeros, and 00000-1 and 12345-0 are both acceptable then:

egrep -v "^[0]{5}-[0]{1,4}$" filename.txt | egrep "^[a-zA-Z0-9]{5}-[a-zA-Z0-9]{1,4}$"

Not sure from your post if the number of characters are really 5 then 1 to 4, but those are easy enought to change.

Thanks for the clarification on regex flavors Alan.

Peter Ajtai
That article you linked to is wrong. The condition in a conditional has to be either a reference to a capturing group (e.g., `(1)`) or a zero-width assertion, such as a lookahead. Your regex works if you use `(?=00000|00000-0000)`, but why go to all that trouble when you can just use a negative lookahead like @harpo did?
Alan Moore
I'll have to get back to this tomorrow. Currently neither mine nor harpo's is working with extended grep. Every time I include "!" I get an event not found error, since I think it's doing the history function..... but basically it's because ?! doesn't work for me.
Peter Ajtai
egrep doesn't support conditionals *or* lookarounds. That article is oriented toward the .NET regex flavor, which supports almost everything.
Alan Moore
@Alan, thanks. I switched things around a little, so that it works without conditionals or lookarounds.
Peter Ajtai
+3  A: 

You can use negative lookahead if your implementation does not support conditionals.

^(?!00000|00000-0000)([a-zA-Z0-9]{5}(-[a-zA-Z0-9]{1,3})?)$

Per your comment, it sounds like you can use positive lookahead instead

^(?=[0-]*[a-ZA-Z1-9])

to ensure that at least one nonzero digit is somewhere in the input before proceeding.

harpo
Remove that last right paren. Then, this regexp will do the trick.
joealba
Right, thanks joe.
harpo
You might as well shorten the lookahead to `(?!00000)` (or `(?!000000)` if it's really supposed to be six characters); if the first alternative won't match, the second won't either.
Alan Moore
i also want to make sure that only zeros is invalid so instead of writing ^(?!00000|00000-0|00000-00-00000-000|00000-0000)([a-zA-Z0-9]{5}(-[a-zA-Z0-9]{1,3})?)$is there any other way to do it?
also 00000-1/00000-12/00000-123 is valid..just all zeros is invalid
@userNUMBERS - If 00000-1 is correct, you should edit your initial question, since in it you state, "the first half cannot have all zeros"
Peter Ajtai
So the lookahead should be `(?!00000-000)`.
Alan Moore
A: 

Here's an idea: Keep your regex, and then use it to extract the digits, and then (in something else, that is not a regex) check to make sure they're not all zero.

  1. You don't write a huge regex that future maintainers will have to look up to figure out what it means.
  2. You don't have to ask a question on Stack Overflow and wait for someone to help you.
  3. You get the job done just as easily, probably roughly equally efficiently, and quicker.
Chris Lutz
+1  A: 
^(?=[^-]*[^0])[a-zA-Z0-9]{6}(-(?=.*[^0])[a-zA-Z0-9]{1,4})?$

Regex explained:

(?=[^-]*[^0]) Make sure there is a non-zero character before the hyphen or the end of the string.

[a-zA-Z0-9]{6} Six alphanumeric characters followed by

The remaining part is optional as it is inside ()?

- a hyphen followed by (make it -? if you want to allow a trailing hyphen as in 123456-)

(?=.*[^0]) Make sure there is a non-zero character in the remaining part

[a-zA-Z0-9]{1,4} one to four alphanumeric characters

Amarghosh