tags:

views:

1130

answers:

7

I wrote a little python program as a personal utility to help me with some refactoring. It's similar to unix replace, except it supports regular expressions and operates on all files in a directory and (optionally) all subdirectories.

The problem is I'm not replacing in place. I'm opening files, passing the contents into memory, and then overwriting the file, like so:

file = open('path/to/file','r')
in_string = file.read()
file.close()
# ...
#Processing logic
# ...
file = open('path/to/file','w')
file.write(out_string)
file.close()

Besides the obvious performance/memory concerns, which are legitimate, but not so much a problem for my use, there's another drawback to this method. SVN freaks out. I can do some copy and paste workarounds after the fact to fix the checksum error that svn throws on a commit, but it makes the utility pointless.

Is there a better way to do this? I'm guessing that if I were editing the file in place there wouldn't be any sort of problem. How do I go about that?

A: 

Freaks out how? What you're describing, if it's working, is editing the file "in place", at least as much as vi(1) does.

Charlie Martin
A: 

Try 'file = open('path/to/file', 'w+')'. This means you are updating an existing file, not writing a new one.

Daniel Watkins
+6  A: 

I suspect the problem is that you are in fact editing wrong files. Subversion should never raise any errors about check sums when you are just modifying your tracked files -- independently of how you are modifying them.

Maybe you are accidentally editing files in the .svn directory? In .svn/text-base, Subversion stores copies of your files using the same name plus the extension .svn-base, make sure that you are not editing those!

Ferdinand Beyer
this is almost certainly the problem. I've had exactly this experience with using sed recursively in an SVN WC.
rmeador
if the OP decides to accept this answer (or one like it), perhaps the title of the question should be renamed such that future people searching will be likely to find it in connection with this problem?
rmeador
Also note that the OP's tool should respect the read-only flag. Files in .svn are read-only.
Wim Coenen
Yeah, this was the problem.
David Berger
+2  A: 

What do you mean by "SVN freaks out"?

Anyway, the way vi/emacs/etc works is as follows:

f = open("/path/to/.file.tmp", "w")
f.write(out_string)
f.close()
os.rename("/path/to/.file.tmp", "/path/to/file")

(ok, there's actually an "fsync" in there... But I don't know off hand how to do that in Python)

The reason it does that copy is to ensure that, if the system dies half way through writing the new file, the old one is still there... And the 'rename' operation is defined as being atomic, so it will either work (you get 100% of the new file) or not work (you get 100% of the old file) -- you'll never be left with a half-file.

David Wolever
+1  A: 

Perhaps the fileinput module can make your code simpler/shorter:

Here's an example:

import fileinput

for line in fileinput.input("test.txt", inplace=1):
    print "%d: %s" % (fileinput.filelineno(), line),
Eli Bendersky
fileinput has other uses... But this isn't the best way to use it. `for line in file("test.txt"): print "line:", line` is better.
David Wolever
A: 

I suspect Ferdinand's answer, that you are recursing into the .svn dir, explains why you are messing up SVN, but note that there is another flaw in the way you are processing files.

If your program is killed, or your computer crashes at the wrong point (when you are writing out the changed contents), you risk losing both the original and new contents of the file. A more robust approach is to perform the following steps:

  1. Read the file into memory, and perform your translations
  2. Write the new contents to "filename.new", rather than the original filename.
  3. Delete the original file, and rename "filename.new" to "filename"

This way, you won't risk losing data if killed at the wrong point. Note that the fileinput module will handle much of this for you. It can be given a sequence of files to process, and if you specify inplace=True, will redirect stdout to the appropriate file (keeping a backup). You could then structure your code something like:

import os
import fileinput

def allfiles(dir, ignore_dirs=set(['.svn'])):
    """Generator yielding all writable filenames below dir.
    Ignores directories specified 
    """
    for basedir, dirs, files in os.walk(dir):
        if basedir in ignore_dirs:
            dirs[:]=[] # Don't recurse
            continue  # Skip this directory

        for filename in files:
            filename = os.path.join(basedir, filename)
            # Check the file is writable
            if os.access(filename, os.W_OK):
                yield filename


for line in fileinput.input(allfiles(PATH_TO_PROCESS), inplace=True):
    line = perform_some_substitution(line)
    print line.rstrip("\n") # Print adds a newline, but line already has one
Brian
Thanks! I ended up implementing something similar, but with regular expressions for ignore_dirs. os.walk() lists all basedirs relative to the original invocation path, meaning allfiles('.') won't skip './.svn' or './subdir/.svn'. Good to know about fileinput now!
David Berger
A: 

I'd like to know if there is an efficient way to get fileinput to do the below vi global substitution on a file. Thanks.

:%s/foo/bar/g

you need to ask question for get an answer. it's not a discussion forum.
SilentGhost
OK. I'll ask a question. Why does the below syntax leave me with an empty file? I thought I could simply replace "inplace" a string.CURRENT = '175.48.204.168'CONFILE = '/tmp/iiiipf.conf'import socketimport fileinputimport subprocessmyname = socket.getaddrinfo(socket.gethostname(), None)[0][4][0]if myname == CURRENT: print 'Nevermind.' subprocess.sys.exit()else: for line in fileinput.input("/tmp/iiiipf.conf", inplace=1): line = line.replace(CURRENT, myname) #sys.stdout.write(line)
Sorry for the lousy formatting. The code begins with "CURRENT"