tags:

views:

94

answers:

3

I have lines of data which I want to parse. The data looks like this:

a score=216 expect=1.05e-06
a score=180 expect=0.0394

What I want to do is to have a subroutine that parse them and return 2 values (score and expect) for each line.

However this function of mine doesn't seem to work:

def scoreEvalFromMaf(mafLines):
    for word in mafLines[0]:
        if word.startswith("score="):
            theScore = word.split('=')[1]
            theEval  = word.split('=')[2]
            return [theScore, theEval]
    raise Exception("encountered an alignment without a score")

Please advice what's the right way to do it?

+2  A: 

If mafLines if a list of lines, and you want to look just at the first one, .split that line to obtain the words. For example:

def scoreEvalFromMaf(mafLines):
    theScore = None
    theEval = None
    for word in mafLines[0].split:
        if word.startswith('score=):
            _, theScore = word.partition('=')
        elif word.startswith('expect='):
            _, theEval = word.partition('=')
    if theScore is None:
        raise Exception("encountered an alignment without a score")
    if theEVal is None:
        raise Exception("encountered an alignment without an eval")
    return theScore, theEval

Note that this will return a tuple with two string items; if you want an int and a float, for example, you need to change the last line to

    return int(theScore), float(theEval)

and then you'll get a ValueError exception if either string is invalid for the type it's supposed to represent, and the returned tuple with two numbers if both strings are valid.

Alex Martelli
@AM: Hi Alex, thanks. But I get this message `"error: 'list' object has no attribute 'split'"`. BTW, is this the right way to store the output of the function: `[score,exp] = scoreEvalFromMaf(maf)`
neversaint
Sounds like mafLines is a list of lists rather than a list of strings. How are you generating it?There are also a couple of bugs in that code: you need to use `.split()` (ie. it's a function call), and also use `word.split('=')` instead of `word.partition('=')`
Anthony Briggs
@neversaint, you definitely need to clarify what that mysterious `mafLines` **is** -- presumably a list of lists, as Anthony says (given the error message you get), but without knowing how you've built it it's essentially impossible to "read your mind" and just divine what the pieces are, out of thin air. Yes, once you clarify this point, you can (if you wish) put those useless brackets around the `score, exp` on the right-hand side of the assignment.
Alex Martelli
+2  A: 

It looks like you want to split each line up by spaces, and parse each chunk separately. If mafLines is a string (ie. one line from .readlines():

def scoreEvalFromMafLine(mafLine):
    theScore, theEval = None, None
    for word in mafLine.split():
        if word.startswith("score="):
            theScore = word.split('=')[1]
        if word.startswith("expect="):
            theEval  = word.split('=')[1]

    if theScore is None or theEval is None:
        raise Exception("Invalid line: '%s'" % line)

    return (theScore, theEval)

The way you were doing it would iterate over each character in the first line (since it's a list of strings) rather than on each space.

Anthony Briggs
@AB: Hi Tony, thanks. But I also get the same message `"error: 'list' object has no attribute 'split'"`using your snippet.
neversaint
Then `mafLines` is a list of lists, not a list of strings. I was assuming that `mafLines` was output from `.readlines()` or similar, but if it isn't, you'll need to clarify what exactly it is, or how you're producing it.
Anthony Briggs
I fixed it using: `"for word in mafLine[0]:"`
neversaint
Sounds like you've already split up your input lines by spaces. So your input (`mafLines`)will look like:`[['a', 'score=1', 'expect=2'], ['a', 'score=3', 'expect=42'], ...]`You might be better off making your function just take one line, rather than the whole list, since it'll be easier to reuse the function later on in your program.
Anthony Briggs
+1  A: 

Obligatory and possibly inappropriate regexp solution:

import re
def scoreEvalFromMaf(mafLines):
    return [re.search(r'score=(.+) expect=(.+)', line).groups()
            for line in mafLines]
harto
That'll explode for invalid input (although that behaviour might be what you want). Turning your `(.+)` into `(.*)` helps to catch blank values, but will still die for really dodgy input.
Anthony Briggs
True enough. It's just a quick-and-dirty demonstration of an alternate strategy.
harto