tags:

views:

342

answers:

4

Update:

This question was an epic failure, but here's the working solution. It's based on Gumbo's answer (Gumbo's was close to working so I chose it as the accepted answer):

Solution:

r'(?=[a-zA-Z0-9\-]{4,25}$)^[a-zA-Z0-9]+(\-[a-zA-Z0-9]+)*$'

Original Question (albeit, after 3 edits)

I'm using Python and I'm not trying to extract the value, but rather test to make sure it fits the pattern.

allowed values:

spam123-spam-eggs-eggs1
spam123-eggs123
spam
1234
eggs123

Not allowed values:

eggs1-
-spam123
spam--spam

I just can't have a dash at the starting or the end. There is a question on here that works in the opposite direction by getting the string value after the fact, but I simply need to test for the value so that I can disallow it. Also, it can be a maximum of 25 chars long, but a minimum of 4 chars long. Also, no 2 dashes can touch each other.

Here's what I've come up with after some experimentation with lookbehind, etc:

# Nothing here
+4  A: 

Try this regular expression:

^[a-zA-Z0-9]+(-[a-zA-Z0-9]+)*$

This regular expression does only allow hyphens to separate sequences of one or more characters of [a-zA-Z0-9].


Edit    Following up your comment: The expression (…)* allows the part inside the group to be repeated zero or more times. That means

a(bc)*

is the same as

a|abc|abcbc|abcbcbc|abcbcbcbc|…

Edit    Now that you changed the requirements: As you probably don’t want to restrict each hyphen separated part of the words in its length, you will need a look-ahead assertion to take the length into account:

(?=[a-zA-Z0-9-]{4,25}$)^[a-zA-Z0-9]+(-[a-zA-Z0-9]+)*$
Gumbo
24 seconds faster than me! Aside: you disallow sequential dashes, and ignore the {4,25} length restrictions requested by OP. (Which I also missed upon first reading of the question...)
ephemient
@orokusaki: The `*` quantifier allows the part inside the group `(…)` to be repeated zero or more times. That means no repetition is also possible.
Gumbo
@ephemient: You didn't miss them, the OP added them later. And has kept adding stuff (no consecutive dashes).
Seth Johnson
@orokusaki: you started out with "anything made with letters or dashes, except the start or end can't be dashes". Then you added the `{4,25}` requirement. Then you added "no two consecutive dashes". None of your initial examples showed your additions.
Seth Johnson
@Gumbo Thanks for taking the time to edit after I changed everything. The only issues with your solution is 1) that it doesn't mention the hyphen in your lookahead and 2) In the pattern, you didn't escape the hyphen (which is a special char), but I've posted a solution in my question based on your answer.
orokusaki
@orokusaki: Ah you’re right, thanks! But the hyphen does not need to be escaped if used a the start or the end of a character class and outside of character classes not at all.
Gumbo
@Gumbo Thanks. I didn't know that bit about hyphens.
orokusaki
@Gumbo One more thing: Is it OK to still escape the hyphen or is it a bad practice (for me it felt more conventional but I don't know if there are implications).
orokusaki
@orokusaki: It’s semantically irrelevant. But it’s making the regular expression a little less readable. And since you’re using Python … ;-)
Gumbo
@Gumbo true lol, thanks again.
orokusaki
+1  A: 

It should be something like this:

^[a-zA-Z0-9]+(-[a-zA-Z0-9]+)*$

You are telling it to look for only one char, either a-z, A-Z, 0-9 or -, that is what [] does.

So if you do [abc] you will match only "a", or "b" or "c". not "abc"

Have fun.

jpabluz
@jpabluz I only put the regex in the title to show the allowed chars. I'm going to use + or * of course, but I wanted to demonstrate which chars are allowed.
orokusaki
A: 

If you simply don't want a dash at the end and beginning, try ^[^-].*?[^-]$

Edit: Bah, you keep changing it.

synic
@synic, just to clarify but it's always been to allow only letters, numbers and dashes in the middle.
orokusaki
It still doesn't say that anywhere in your description.
synic
@synic Never mind brother.
orokusaki
@synic, to be fair, it has always said that in the *title*. Admittedly not the best place to put requirements, but there you have it...
Peter Hansen
Ah, there it is. Sorry.
synic
+3  A: 

The current regex is simple and fairly readable. Rather than making it long and complicated, have you considered applying the other constraints with normal Python string processing tools?

import re

def fits_pattern(string):
    if (4 <= len(string) <= 25 and
        "--" not in string and
        not string.startswith("-") and
        not string.endswith("-")):

        return re.match(r"[a-zA-Z0-9\-]", string)
    else:
        return None
Mike Graham
That might have gone a bit overboard with the not-putting-it-in-the-regex, but the general idea is worth considering. As the old adage goes: *Some people, when confronted with a problem, think "I know, I'll use regular expressions." Now they have two problems.*
Mike Graham
@Mike Thanks for your contribution.
orokusaki