views:

141

answers:

4

Beginner here! Forgive me in advance for raising what is probably an incredibly simple problem.

I've been trying to put together a Python script that runs multiple find-and-replace actions and a few similar things on a specified plain-text file. It works, but from a programming perspective I doubt it works well. How would I best go about optimizing the actions made upon the 'outtext' variable? At the moment it's basically doing a very similar thing four times over...

import binascii
import re
import struct
import sys

infile = sys.argv[1]
charenc = sys.argv[2]
outFile=infile+'.tex'

findreplace = [
('TERM1', 'TERM2'),
('TERM3', 'TERM4'),
('TERM5', 'TERM6'),
]

inF = open(infile,'rb')
s=unicode(inF.read(),charenc)
inF.close()

# THIS IS VERY MESSY.
for couple in findreplace:
    outtext=s.replace(couple[0],couple[1])
    s=outtext

for couple in findreplace:
    outtext=re.compile('Title: (.*)', re.I).sub(r'\\title'+ r'{\1}', s)
    s=outtext

for couple in findreplace:
    outtext=re.compile('Author: (.*)', re.I).sub(r'\\author'+ r'{\1}', s)
    s=outtext

for couple in findreplace:
    outtext=re.compile('Date: (.*)', re.I).sub(r'\\date'+ r'{\1}', s)
    s=outtext
# END MESSY SECTION.

outF = open(outFile,'wb')
outF.write(outtext.encode('utf-8'))
outF.close()
+2  A: 

Some possibilities:

  • Do the re.compile outside the loop.
  • Use only one loop and do all four things inside it.
  • Use one regex, which matches on either Title, Author or Date.
  • Don't do the last three replaces inside a loop, since they do not use the couple variable.
Sjoerd
I would give you a +1 except for the last "use one regex ..." that is bad advice, regex are complicated enough to maintain, without complicating them even more for no reason.
fuzzy lollipop
But in this case that would be an extremely straightforward regex, so +1 from me.
Tim Pietzcker
A: 

Hi, your code is not so bad. If you want to optimise for speed then I would look into compiling your regex's at the start of the program.

Also, If you want to tidy up your code then chain all your subs together in one loop.

Damian Kennedy
A: 

This might help: http://wiki.python.org/moin/PythonSpeed/PerformanceTips

Daniel
Sorry if i didn't specify much information..
Daniel
+4  A: 

For the last three replacements, the for-loop seems to be unnecessary, since the loop body doesn't use the loop variable. Also it is not necessary to assign the result to a temporary outtext variable, it can be directly assigned to s.

These changes make the code much more concise:

for couple in findreplace:
    s = s.replace(couple[0],couple[1])

s = re.compile('Title: (.*)', re.I).sub(r'\\title {\1}', s)
s = re.compile('Author: (.*)', re.I).sub(r'\\author {\1}', s)
s = re.compile('Date: (.*)', re.I).sub(r'\\date {\1}', s)
sth