views:

107

answers:

2

I am working on a problem and got stuck at a wall

I have a (potentially large) set of text files, and I need to apply a sequence of filters and transformations to it and export it to some other places.

so I roughly have

def apply_filter_transformer(basepath = None, newpath = None, fts= None):
    #because all the raw studies in basepath should not be modified, so I first cp all to newpath
    for i in listdir(basepath):
        file(path.join(newpath, i), "wb").writelines(file(path.join(basepath, i)).readlines())
    for i in listdir(newpath):
        fileobj = open(path.join(newpath, i), "r+")
        for fcn in fts:
            fileobj = fcn(fileobj)
        if fileobj is not None:
            fileobj.writelines(fileobj.readlines())
        try:
            fileobj.close()
        except:
            print i, "at", fcn
            pass
def main():
    apply_filter_transformer(path.join(pardir, pardir, "studies"),
                         path.abspath(path.join(pardir, pardir, "filtered_studies")),
                         [
                        #transformer_addMemo,
                          filter_executable,
                          transformer_identity,
                          filter_identity,
                          ])

and fts in apply_filter_transformer is a list of function that takes a python file object and return a python file object. The problem that I went into is that when I want to insert strings into a text object, I get uninformative error and got stuck for all morning.

def transformer_addMemo(fileobj):
    STYLUSMEMO =r"""hellow world"""
    study = fileobj.read()
    location = re.search(r"</BasicOptions>", study)
    print fileobj.name
    print fileobj.mode
    fileobj.seek(0)
    fileobj.write(study[:location.end()] + STYLUSMEMO + study[location.end():])
    return fileobj

and this gives me

Traceback (most recent call last):
 File "E:\mypy\reg_test\src\preprocessor\preprocessor.py", line 292, in <module>
  main()
 File "E:\mypy\reg_test\src\preprocessor\preprocessor.py", line 288, in main
 filter_identity,
 File "E:\mypy\reg_test\src\preprocessor\preprocessor.py", line 276, in     apply_filter_transformer
   fileobj.writelines(fileobj.readlines())
   IOError: [Errno 0] Error

If anyone can give me more info on the error, I would appreciate very very much.

+1  A: 

There is handy python module for modifing or reading a group of files: fileinput

I'm not sure what is causing this error. But you are reading the whole file into memory which is a bad idea in your case because the files are potentially large. Using fileinput you can replace the files easily. For example:

import fileinput
import sys

for line in fileinput.input(list_of_files, inplace=True):
    sys.stdout.write(line)
    if keyword in line:
         sys.stdout.write(my_text)
Nadia Alramli
+1  A: 

It's not really possible to tell what's causing the error from the code you posted. The problem may be in the protocol you've adopted for your transformation functions.

I'll simplify the code a bit:

fileobj = file.open(path, mode)
fileobj = fcn(fileobj)
fileobj.writelines(fileobj.readlines())

What assurance do I have that fcn returns a file that's open in the mode that my original file was? That it returns a file that's open at all? That it returns a file? Well, I don't.

It doesn't seem like there's any reason for you to even be using file objects in your process. Since you're reading the entire file into memory, why not just make your transformation functions take and return strings? So your code would look like this:

with open(filename, "r") as f:
    s = f.read()
for transform_function in transforms:
    s = transform_function(s)
with open(filename, "w") as f:
    f.write(s)

Among other things, this totally decouples the file I/O part of your program from the data-transformation part, so that problems in one don't affect the other.

Robert Rossney