tags:

views:

221

answers:

4

I'm trying to use a regex in Python to match a file (saved as a string, ie "/volumes/footage/foo/bar.mov") to a log file I create that contains a list of files. But when I run the script, it gives me this error: sre_constants.error: unbalanced parenthesis. The code I'm using is this:

To read the file:

theLogFile = The_Root_Path + ".processedlog"
if os.path.isfile(theLogFile):
        the_file = open(theLogFile, "r")
    else:
        open(theLogFile, 'w').close()
        the_file = open(theLogFile, "r")
    the_log = the_file.read()
    the_file.close()

Then inside a for loop I reassign (I didn't realize I was doing this until I posted this question) the the_file variable as a string from a list of files (obtained by running through a folder and it's subsets and grabbing all the filenames), then try to use regex to see if that filename is present in the log file:

for the_file in filenamelist:
    p = re.compile(the_file, re.IGNORECASE)
    m = p.search(the_log)

Every time it hits the re.compile() part of the code it spits out that error. And if I try to cut that out, and use re.search(the_file, the_log) it still spits out that error. I don't understand how I could be getting unbalanced parenthesis from this.

+3  A: 

Where is the regular expression pattern? Are you trying to use filenames contained in one file as patterns to search the other file? If so, you will want to step through the_file with someting like

for the_pattern in the_file:
    p = re.compile(the_pattern, re.IGNORECASE)
    m = p.search(the_log)
    ...

According to the Python re.compile documentation, the first argument to re.compile() should be the regular expression pattern as a string.

But the return value of open() is a file object, which you assign to the_file and pass to re.compile()....

Joe Koberg
I explained poorly. I edited the question to better explain the issue. Sorry.
Gordon Fontenot
Feel free to paste the _actual_ code that you're using, you have still omitted the most important part - what is the contents of `filenamelist` that are being used as patterns? And maybe precede the failing line of code with `print the_pattern` and then post the pattern...
Joe Koberg
+1  A: 

What you're binding to name the_file in your first snippet is a file object, even though you say that's "saved as a string", the filename (i.e. the string) is actually named theLogFile but what you're trying t turn into a RE object is not theLogFile (the string), it's the_file (the now-closed file object). Given this, the error's somewhat quirky (one would expect a TypeError), but it's clear that you will get an error at re.compile.

Alex Martelli
I explained poorly. I edited the question to better explain the issue. Sorry.
Gordon Fontenot
+1  A: 

the_file should be a string. In the above code the_file is the return value of open, which is a file object.

Strawberry
I explained poorly. I edited the question to better explain the issue. Sorry.
Gordon Fontenot
+1  A: 

Gordon,

it would seem to me that the issue is in the data. You are compiling uninspected strings from the filelist into regexp, not heeding that they might contain meta characters relevant for the regexp engine.

In your for loop, add a print the_file before the call to re.compile (it is no problem that you are re-using a name as the loop iterator that referred to file object before), so you can see which strings are actually coming from the filelist. Or, better still, run all instances of the_file through re.escape before passing them to re.compile. This will turn all meta characters into their normal equivalent.

ThomasH
That was it. Thanks. using `re.escape` fixed it.
Gordon Fontenot