tags:

views:

66

answers:

1

I have the following regular expression:

[0-9]{8}.*\n.*\n.*\n.*\n.*

Which I have tested in Expresso against the file I am working and the match is sucessfull.

I want to match the following:

  • Reference number 8 numbers long
  • Any character, any number of times
  • New Line
  • Any character, any number of times
  • New Line
  • Any character, any number of times
  • New Line
  • Any character, any number of times
  • New Line
  • Any character, any number of times

My python code is:

for m in re.findall('[0-9]{8}.*\n.*\n.*\n.*\n.*', l, re.DOTALL):
       print m

But no matches are produced, as said in Expresso there are 400+ matches which is what I would expect.

What I am missing here?

+3  A: 

Don't use re.DOTALL or the dot will match newlines, too. Also use raw strings (r"...") for regexes:

for m in re.findall(r'[0-9]{8}.*\n.*\n.*\n.*\n.*', l):
   print m

However, your version still should have worked (although very inefficiently) if you have read the entire file in memory as one large string.

So the question is, are you reading the file like this:

with open("filename","r") as myfile:
    mydata = myfile.read()
    for m in re.findall(r'[0-9]{8}.*\n.*\n.*\n.*\n.*', mydata):
        print m

Or are you working with single lines (for line in myfile: or myfile.readlines())? In that case, the regex can't work, of course.

Tim Pietzcker
Hi, yes I am running python on windows but the file is from a unix environment.
humira
The origin of the file is unlikely to matter. The question was whether you were opening the whole file at once or using an iterator. Python iterators will iterate over new line characters.
Tim McNamara