tags:

views:

114

answers:

5

I need to be able to tell the difference between a string that can contain letters and numbers, and a string that can contain numbers, colons and hyphens.

>>> def checkString(s):
...   pattern = r'[-:0-9]'
...   if re.search(pattern,s):
...     print "Matches pattern."
...   else:
...     print "Does not match pattern."

# 3 Numbers seperated by colons. 12, 24 and minus 14
>>> s1 = "12:24:-14"
# String containing letters and string containing letters/numbers.
>>> s2 = "hello"
>>> s3 = "hello2"

When I run the checkString method on each of the above strings:

>>>checkString(s1)
Matches Pattern.
>>>checkString(s2)
Does not match Pattern.
>>>checkString(s3)
Matches Pattern

s3 is the only one that doesn't do what I want. I'd like to be able to create a regex that allows numbers, colons and hyphens, but excludes EVERYTHING else (or just alphabetical characters). Can anyone point me in the right direction?

EDIT:

Therefore, I need a regex that would accept:

229            // number
187:657        //two numbers
187:678:-765   // two pos and 1 neg numbers

and decline:

Car          //characters
Car2         //characters and numbers
A: 
r'^[0-9:-]+$'

Should match either a string of numbers/colons/hyphens only. If you want to allow the regex to also match an empty string, just change the + sign to an * instead.

Explanation of the regex:

^                           # Match start of string
  [0-9:-]+                  # match 1+ of the following - digits, colons, or hyphens
$                           # Match end of string
Amber
and how exactly would OP would be able to tell the **difference** between two types of strings with your regex?
SilentGhost
Again, see my comment. The OP's original statement wording was a bit unclear, to me it seemed as if they wanted to match either an alpha-only string OR a number/hyphen/colon string, but not one that contained a mix.
Amber
A: 
pattern = r'\A([^-:0-9]+|[A-Za-z0-9])\Z'
yu_sha
Oops. I am wrong. Correcting
yu_sha
+5  A: 

you need to match the whole string, not a single character as you do at the moment:

>>> re.search('^[-:0-9]+$', "12:24:-14")
<_sre.SRE_Match object at 0x01013758>
>>> re.search('^[-:0-9]+$', "hello")
>>> re.search('^[-:0-9]+$', "hello2")

To explain regex:

  • within square brackets (character class): match digits 0 to 9, hyphen and colon, only once.
  • + is a quantifier, that indicates that preceding expression should be matched as many times as possible but at least once.
  • ^ and $ match start and end of the string. For one-line strings they're equivalent to \A and \Z.

This way you restrict content of the whole string to be at least one-charter long and contain any permutation of characters from the character class. What you were doing before hand was to search for a single character from the character class within subject string. This is why s3 that contains a digit matched.

SilentGhost
Your regex will fail to match on the second string, which they have indicated is not what they want.
Amber
Let's read the question again: *s3 is the only one that doesn't do what I want.* s2 and s3 both are not supposed to match.
SilentGhost
Mm, a bit cryptically phrased I guess. "s3 is the only one that doesn't do what I want" could be interpreted as "the only one I don't want to match".
Amber
Edited the question. Maybe that's a little bit clearer?
day_trader
@_bravado: to me it was clear from the very beginning :) and my solution evidently works.
SilentGhost
Just tried it and it works. Thank you very much! :)
day_trader
Can I ask what the '^' does? Or could you explain what each of the symbols mean (before/after the square brackets)?
day_trader
Why not use .match instead of .search and drop the ^/$ ?
chrispy
you would be able to drop only the `^`, not the `$`. and I don't think that it's worth confusion.
SilentGhost
Note that this would also match strings like `"---::::"` with no digits at all. I think you're looking for something like my answer...
Peter Di Cecco
A: 

Your regular expression is almost fine; you just need to make it match the whole string. Also, as a commenter pointed out, you don't really need a raw string (the r prefix on the string) in this case. Voila:

def checkString(s):
  if re.match('[-:0-9]+$', s):
    print "Matches pattern."
  else:
    print "Does not match pattern."

The '+' means "match one or more of the previous expression". (This will make checkString return False on an empty string. If you want True on an empty string, change the '+' to a '*'.) The '$' means "match the end of the string".

re.match means "the string must match the regular expression starting at the first character"; re.search means "the regular expression can match a sequence anywhere inside the string".

Also, if you like premature optimization--and who doesn't!--note that 're.match' needs to compile the regular expression each time. This version compiles the regular expression only once:

__checkString_re = re.compile('[-:0-9]+$')
def checkString(s):
  global __checkString_re
  if __checkString_re.match(s):
    print "Matches pattern."
  else:
    print "Does not match pattern."
Larry Hastings
try this: `>>> re.match('[-:0-9]+', "2hello2")`
SilentGhost
Why did you declare the compiled string `global`? To save space by keeping the function from closing over it?
steveha
steveha: Just to show intent. The function as written would work fine if that line were removed.Personally I find the rules regarding when Python automatically looks in module scope for you annoying, so I tend to use "global" more often then most.
Larry Hastings
+1  A: 

SilentGhost's answer is pretty good, but take note that it would also match strings like "---::::" with no digits at all.

I think you're looking for something like this:

'^(-?\d+:)*-?\d+$'
  • ^ Matches the beginning of the line.
  • (-?\d+:)* Possible - sign, at least one digit, a colon. That whole pattern 0 or many times.
  • -?\d+ Then the pattern again, at least once, without the colon
  • $ The end of the line

This will better match the strings you describe.

Peter Di Cecco