tags:

views:

89

answers:

2

How do I include an end-of-string and one non-digit characters in a python 2.6 regular expression set for searching?

I want to find 10-digit numbers with a non-digit at the beginning and a non-digit or end-of-string at the end. It is a 10-digit ISBN number and 'X' is valid for the final digit.

The following do not work:

is10 = re.compile(r'\D(\d{9}[\d|X|x])[$|\D]')
is10 = re.compile(r'\D(\d{9}[\d|X|x])[\$|\D]')
is10 = re.compile(r'\D(\d{9}[\d|X|x])[\Z|\D]')

The problem arises with the last set: [\$|\D] to match a non-digit or end-of-string.

Test with:

line = "abcd0123456789"
m = is10.search(line)
print m.group(1)

line = "abcd0123456789efg"
m = is10.search(line)
print m.group(1)
+3  A: 

You have to group the alternatives with parenthesis, not brackets:

r'\D(\d{9}[\dXx])($|\D)'

| is a different construct than []. It marks an alternative between two patterns, while [] matches one of the contained characters. So | should only be used inside of [] if you want to match the actual character |. Grouping of parts of patterns is done with parenthesis, so these should be used to restrict the scope of the alternative marked by |.

If you want to avoid that this creates match groups, you can use (?: ) instead:

r'\D(\d{9}[\dXx])(?:$|\D)'
sth
`[\d|X|x]` what do you think it does?
SilentGhost
[\d|X|x] matches one of: digit, X, x or |. re.search('[a|b]', '|') produces a match.
foosion
A: 
\D(\d{10})(?:\Z|\D)

find non-digit followed by 10 digits, and a single non-digit or a end-of-string. Captures only digits. While I see that you're searching for nine digit followed by digit or X or x, I don't see same thing in your requirements.

SilentGhost