tags:

views:

113

answers:

4

I have a small python script which i use everyday......it basically reads a file and for each line i basically apply different string functions like strip(), replace() etc....im constanstly editing the file and commenting to change the functions. Depending on the file I'm dealing with, I use different functions. For example I got a file where for each line, i need to use line.replace(' ','') and line.strip()...

What's the best way to make all of these as part of my script? So I can just say assign numbers to each functions and just say apply function 1 and 4 for each line.

+2  A: 

It is possible to map string operations to numbers:

>>> import string
>>> ops = {1:string.split, 2:string.replace}
>>> my = "a,b,c"
>>> ops[1](",", my)
[',']
>>> ops[1](my, ",")
['a', 'b', 'c']
>>> ops[2](my, ",", "-")
'a-b-c'
>>>

But maybe string descriptions of the operations will be more readable.

>>> ops2={"split":string.split, "replace":string.replace}
>>> ops2["split"](my, ",")
['a', 'b', 'c']
>>>

Note: Instead of using the string module, you can use the str type for the same effect.

>>> ops={1:str.split, 2:str.replace}
gimel
+2  A: 

If you insist on numbers, you can't do much better than a dict (as gimel suggests) or list of functions (with indices zero and up). With names, though, you don't necessarily need an auxiliary data structure (such as gimel's suggested dict), since you can simply use getattr to retrieve the method to call from the object itself or its type. E.g.:

def all_lines(somefile, methods):
  """Apply a sequence of methods to all lines of some file and yield the results.
  Args:
    somefile: an open file or other iterable yielding lines
    methods: a string that's a whitespace-separated sequence of method names.
        (note that the methods must be callable without arguments beyond the
         str to which they're being applied)
  """
  tobecalled = [getattr(str, name) for name in methods.split()]
  for line in somefile:
    for tocall in tobecalled: line = tocall(line)
    yield line
Alex Martelli
+1  A: 

First of all, many string functions – including strip and replace – are deprecated. The following answer uses string methods instead. (Instead of string.strip(" Hello "), I use the equivalent of " Hello ".strip().)

Here's some code that will simplify the job for you. The following code assumes that whatever methods you call on your string, that method will return another string.

class O(object):
    c = str.capitalize
    r = str.replace
    s = str.strip

def process_line(line, *ops):
    i = iter(ops)
    while True:
        try:
            op = i.next()
            args = i.next()
        except StopIteration:
            break
        line = op(line, *args)
    return line

The O class exists so that your highly abbreviated method names don't pollute your namespace. When you want to add more string methods, you add them to O in the same format as those given.

The process_line function is where all the interesting things happen. First, here is a description of the argument format:

  • The first argument is the string to be processed.
  • The remaining arguments must be given in pairs.
    • The first argument of the pair is a string method. Use the shortened method names here.
    • The second argument of the pair is a list representing the arguments to that particular string method.

The process_line function returns the string that emerges after all these operations have performed.

Here is some example code showing how you would use the above code in your own scripts. I've separated the arguments of process_line across multiple lines to show the grouping of the arguments. Of course, if you're just hacking away and using this code in day-to-day scripts, you can compress all the arguments onto one line; this actually makes it a little easier to read.

f = open("parrot_sketch.txt")
for line in f:
    p = process_line(
        line,
        O.r, ["He's resting...", "This is an ex-parrot!"],
        O.c, [],
        O.s, []
    )
    print p

Of course, if you very specifically wanted to use numerals, you could name your functions O.f1, O.f2, O.f3… but I'm assuming that wasn't the spirit of your question.

Wesley
A: 

To map names (or numbers) to different string operations, I'd do something like

OPERATIONS = dict(
    strip = str.strip,
    lower = str.lower,
    removespaces = lambda s: s.replace(' ', ''),
    maketitle = lamdba s: s.title().center(80, '-'),
    # etc
)

def process(myfile, ops):
    for line in myfile:
        for op in ops:
            line = OPERATIONS[op](line)
        yield line

which you use like this

for line in process(afile, ['strip', 'removespaces']):
    ...
dF