tags:

views:

53

answers:

3

By reading the documentation here it seems to me that

re.compile(r'^[-\w]+$')

would just search whether there was any character that is alphanumeric, an underscore, or a hyphen. But really this returns a match only if all the characters fit that description (ie, it fails if there is a space or a dollar sign or asterisk, etc).

I don't really understand how this is working to check all of the characters when it says:

"The '*', '+', and '?' qualifiers are all greedy; they match as much text as possible.".

Doesn't that mean that if there's a space character in at the 6th character it'll match as much as possible and then stop and return the match it found in the first 5 characters (rather than saying "sorry, I found nothing" essentially when it reaches a non-match).

Thanks in advance (I'm such a noob at regex and each time I learn it again I just get confused).

+3  A: 

The ^ and $ anchor the regex at the beginning and ending of the string, therefore all characters would have to match the pattern in between.

Ignacio Vazquez-Abrams
Actually, only `\A` and `\Z` are guaranteed to mean “match at the start and end of the string”; the `re.MULTILINE` / `(?m)` flag changes the meaning of `^` and `$` to start and end of line, respectively.
ΤΖΩΤΖΙΟΥ
+4  A: 

The two characters ^ and $ mark the start and the end of the string respectively. So ^[-\w]+$ will only match if there are only one or more word characters or a hyphen ([-\w]+) between the start (^) and the end of the string ($).

Gumbo
So, the ^ start, and $ end characters specify that everything has to match VS [-\w]+ which would match as much as possible, but then stop at a non-match. I think I understand now.
orokusaki
@orokusaki: To have a match, the following conditions must be fulfilled: 1) **`^`** `[-\w]+$`: there must be a start of the string 2) `^` **`[-\w]+`** `$`: after that, there must be one or more characters that is either a hyphen or a word character 3) `^[-\w]+` **`$`**: after that, there must be the end of the string.
Gumbo
+2  A: 

just as per answers above, ^ and $ enclose all charactes in between and they represent line start and end respectively. If in doubt re any expression try debug mode, that usually explains a lot:

>>> p = re.compile("^[-\w]+$", re.DEBUG)
at at_beginning
max_repeat 1 65535
  in
    literal 45
    category category_word
at at_end
>>>
pulegium