tags:

views:

254

answers:

4

I'm writing an application in Python that is going to have a lot of different functions, so logically I thought it would be best to split up my script into different modules. Currently my script reads in a text file that contains code which has been converted into tokens and spellings. The script then reconstructs the code into a string, with blank lines where comments would have been in the original code.

I'm having a problem making the script object-oriented though. Whatever I try I can't seem to get the program running the same way it would as if it was just a single script file. Ideally I'd like to have two script files, one that contains a class and function that cleans and reconstructs the file. The second script would simply call the function from the class in the other file on a file given as an argument from the command line. This is my current script:

import sys

tokenList = open(sys.argv[1], 'r')
cleanedInput = ''
prevLine = 0

for line in tokenList:

    if line.startswith('LINE:'):
        lineNo = int(line.split(':', 1)[1].strip())
        diff = lineNo - prevLine - 1

        if diff == 0:
            cleanedInput += '\n'
        if diff == 1:
            cleanedInput += '\n\n'
        else:
            cleanedInput += '\n' * diff

        prevLine = lineNo
        continue

    cleanedLine = line.split(':', 1)[1].strip()
    cleanedInput += cleanedLine + ' '

print cleanedInput

After following Alex Martelli advice below, I now have the following code which gives me the same output as my original code.

def main():
    tokenList = open(sys.argv[1], 'r')
    cleanedInput = []
    prevLine = 0

    for line in tokenList:

        if line.startswith('LINE:'):
            lineNo = int(line.split(':', 1)[1].strip())
            diff = lineNo - prevLine - 1

            if diff == 0:
                cleanedInput.append('\n')
            if diff == 1:
                cleanedInput.append('\n\n')
            else:
                cleanedInput.append('\n' * diff)

            prevLine = lineNo
            continue

        cleanedLine = line.split(':', 1)[1].strip()
        cleanedInput.append(cleanedLine + ' ')

    print cleanedInput

if __name__ == '__main__':
    main()

I would still like to split my code into multiple modules though. A 'cleaned file' in my program will have other functions performed on it so naturally a cleaned file should be a class in its own right?

+1  A: 

When I do this particular refactoring, I usually start with an initial transformation within the first file. Step 1: move the functionality into a method in a new class. Step 2: add the magic invocation below to get the file to run like a script again:

class LineCleaner:

    def cleanFile(filename):
        cleanInput = ""
        prevLine = 0
        for line in open(filename,'r'):         
           <... as in original script ..>

if __name__ == '__main__':
     cleaner = LineCleaner()
     cleaner.cleanFile(sys.argv[1])
CBFraser
I adjusted your indentation. However, `clean()` and `arg` remain undefined.
Ewan Todd
Sorry, hit the submit button before I was ready. Also, I just figured out the code sample widgit. *sigh*
CBFraser
Ha, two obsessive-compulsives trying to clean up the same post at the same time. It's like a deranged O. Henry story.
Robert Rossney
_deranged O. Henry story_
foosion
"The Gift of the Magi," perhaps. http://www.literaturecollection.com/a/o_henry/25/
Ewan Todd
+5  A: 

To speed up your existing code measurably, add def main(): before the assignment to tokenList, indent everything after that 4 spaces, and at the end put the usual idiom

if __name__ == '__main__':
  main()

(The guard is not actually necessary, but it's a good habit to have nevertheless since, for scripts with reusable functions, it makes them importable from other modules).

This has little to do with "object oriented" anything: it's simply faster, in Python, to keep all your substantial code in functions, not as top-level module code.

Second speedup, change cleanedInput into a list, i.e., its first assignment should be = [], and wherever you now have +=, use .append instead. At the end, ''.join(cleanedInput) to get the final resulting string. This makes your code take linear time as a function of input size (O(N) is the normal way of expressing this) while it currently takes quadratic time (O(N squared)).

Then, correctness: the two statements right after continue never execute. Do you need them or not? Remove them (and the continue) if not needed, remove the continue if those two statements are actually needed. And the tests starting with if diff will fail dramatically unless the previous if was executed, because diff would be undefined then. Does your code as posted perhaps have indentation errors, i.e., is the indentation of what you posted different from that of your actual code?

Considering these important needed enhancements, and the fact that it's hard to see what advantage you are pursuing in making this tiny code OO (and/or modular), I suggest clarifying the indenting / correctness situation, applying the enhancements I've proposed, and leaving it at that;-).

Edit: as the OP has now applied most of my suggestions, let me follow up with one reasonable way to hive off most functionality to a class in a separate module. In a new file, for example foobar.py, in the same directory as the original script (or in site-packages, or elsewhere on sys.path), place this code:

def token_of(line):
  return line.partition(':')[-1].strip()

class FileParser(object):
  def __init__(self, filename):
    self.tokenList = open(filename, 'r')

  def cleaned_input(self):
    cleanedInput = []
    prevLine = 0

    for line in self.tokenList:
        if line.startswith('LINE:'):
            lineNo = int(token_of(line))
            diff = lineNo - prevLine - 1
            cleanedInput.append('\n' * (diff if diff>1 else diff+1))
            prevLine = lineNo
        else:
            cleanedLine = token_of(line)
            cleanedInput.append(cleanedLine + ' ')

    return cleanedInput

Your main script then becomes just:

import sys
import foobar

def main():
    thefile = foobar.FileParser(sys.argv[1])
    print thefile.cleaned_input()

if __name__ == '__main__':
  main()
Alex Martelli
To expand on one of Alex's points a little: in Python, strings are immutable. `s += 'abc'` makes a completely new copy of `s` with `abc` appended. So if you're building a long string with concatenation, you're copying an ever-growing string over and over again. Appending items to a list (or using the `StringIO` module) and then doing a single `join` at the end avoids this.
Robert Rossney
Thanks for those comments Alex. Yes there was indentation errors from when I pasted it in. I've fixed that now to reflect what I'm looking at in my IDE. The reason why I would like this code to become modular is because the program will eventually be a recursive-descent parser, so I would like to split things up for simplicity.
greenie
@greenie, thanks for fixing the indentation, but the problems I pointed out (code not in a function, string +=) remain, and there are others such as the total repetition of the complex split-and-strip expression, that are **far** higher priority, to enhance this code's quality, than relegating a few lines to a method rather than a function. First things first!-)
Alex Martelli
I've edited my original question to reflect the changes you advised Alex. Hopefully it should now be possible to make a class out of this so I can represend a 'cleaned file' as an object that I can perform other functions on later.
greenie
@greenie, sure, let me edit my answer to show one way you could hive off the functionality into a class, if you really want to.
Alex Martelli
Thanks Alex, this has been a really useful answer and I understand a lot more about OO Python now.
greenie
A: 

You can get away with creating a function and putting all your logic in it. For full "object orientedness" though, you can do something like this:

ps - your posted code has a bug on the continue line - it is always executed and the last 2 lines will never execute.

class Cleaner:
  def __init__(...):
    ...init logic...
  def Clean(self):
    for line in open(self.tokenList):
      ...cleaning logic...
    return cleanedInput

def main(argv):
  cleaner = Cleaner(argv[1])
  print cleaner.Clean()
  return 0

if '__main__' == __name__:
  sys.exit(main(sys.argv))
Richard Levasseur
My mistake. Upon pasting I forgot to indent continue and everything above it to keep it inside the first if statement.
greenie
A: 

If presented code is all code Just don't add any class !!

Your code is too simply for that !! OOP approach would add unnecessary complexity.

But if still wont. Put all code into function eg.

def parse_tokenized_input(file):
    tokenList = open(file, 'r')
    cleanedInput = ''
    prevLine = 0
    #rest of code

at end add:

if __name__ == '__main__':
    parse_tokenized_input(sys.argv[1])

If code works correct put def of function to new file (and all needed imports!) eg. mymodyle.py

your script now will be:

from mymodule.py import parse_tokenized_input

if __name__ == '__main__':
        parse_tokenized_input(sys.argv[1])

Oh and think out better name for your function and module (module should have general name).

przemo_li