ansaurus

Question

How to I extract floats from a file in Python?

Answer 1

+12 A:

Here's one way.

def floats( aList ):
    for v in aList:
        try:
            yield float(v)
        except ValueError:
            pass

a = list( floats( [....] ) )

S.Lott 2010-02-19 21:16:16

this will extract integers too, won't it?

SilentGhost 2010-02-19 23:46:54

If integers were to be excluded from the list, I suppose a "if '.' in v" clause inside the try block would do the trick.

chradcliffe 2010-02-20 00:30:55

@chradcliffe. No. You don't need any `if` statement. First try to convert with `int()`. If that succeeds, pass. If the `int` conversion fails, then try the conversion with `float`. You don't want to add any logic outside the built-in function. Example `int('12.')` raises `ValueError`.

S.Lott 2010-02-20 12:51:46

Answer 2

+7 A:

floats = []
all = ['#', '3e98.mtz', 'MR_AUTO', 'with', 'model', '200la_.pdb', 'SPACegroup', 'HALL', 'P', '2yb', '#P', '1', '21', '1', 'SOLU', 'SET', 'RFZ=3.0', 'TFZ=4.7', 'PAK=0', 'LLG=30', 'SOLU', '6DIM', 'ENSE', '200la_', 'EULER', '321.997', '124.066', '234.744', 'FRAC', '-0.14681', '0.50245', '-0.05722', 'SOLU', 'SET', 'RFZ=3.3', 'TFZ=4.2', 'PAK=0', 'LLG=30', 'SOLU', '6DIM', 'ENSE', '200la_', 'EULER', '329.492', '34.325', '209.775', 'FRAC', '0.70297', '0.00106', '-0.24023', 'SOLU', 'SET', 'RFZ=3.6', 'TFZ=3.6', 'PAK=0', 'LLG=30', 'SOLU', '6DIM', 'ENSE', '200la_', 'EULER', '177.344', '78.287', '187.356', 'FRAC', '0.04890', '0.00090', '-0.57497']
for element in all:
    try:
        floats.append(float(element))
    except ValueError:
        pass

David Berger 2010-02-19 21:17:16

Nice touch -- appending the floats to a list keeps the floats useful.

steve 2010-02-19 21:33:59

Answer 3

+3 A:

def is_float(i):
        try:
            float(i)
            return True
        except ValueError:
            return False


L=['#', '3e98.mtz', 'MR_AUTO', 'with', 'model', '200la_.pdb', 'SPACegroup', 'HALL', 'P', '2yb', '#P', '1', '21', '1', 'SOLU', 'SET', 'RFZ=3.0', 'TFZ=4.7', 'PAK=0', 'LLG=30', 'SOLU', '6DIM', 'ENSE', '200la_', 'EULER', '321.997', '124.066', '234.744', 'FRAC', '-0.14681', '0.50245', '-0.05722', 'SOLU', 'SET', 'RFZ=3.3', 'TFZ=4.2', 'PAK=0', 'LLG=30', 'SOLU', '6DIM', 'ENSE', '200la_', 'EULER', '329.492', '34.325', '209.775', 'FRAC', '0.70297', '0.00106', '-0.24023', 'SOLU', 'SET', 'RFZ=3.6', 'TFZ=3.6', 'PAK=0', 'LLG=30', 'SOLU', '6DIM', 'ENSE', '200la_', 'EULER', '177.344', '78.287', '187.356', 'FRAC', '0.04890', '0.00090', '-0.57497']
print filter(is_float,L)

gnibbler 2010-02-19 21:25:50

Answer 4

+1 A:

If you display your input in a manner that discourages answerers from examining its structure, and you ask questions like "how do I extract only the floating point numbers", and bury useful information like "In each file that I am looking at, there is 6 numbers like that in each line" in comments, you will get knee-jerk answers providing exactly what you asked for: a list of "floats" that includes 3 spurious numbers (1.0, 21.0, and 1.0) at the front of the list.

If you display your data in a slightly more congenial fashion, like:

alist = [
    '#', '3e98.mtz', 'MR_AUTO', 'with', 'model', '200la_.pdb', 'SPACegroup', 'HALL', 'P', '2yb',
    '#P', '1', '21', '1', 
    'SOLU', 'SET', 'RFZ=3.0', 'TFZ=4.7', 'PAK=0', 'LLG=30', 'SOLU', '6DIM', 'ENSE', '200la_',
        'EULER', '321.997', '124.066', '234.744', 'FRAC', '-0.14681', '0.50245', '-0.05722',
    'SOLU', 'SET', 'RFZ=3.3', 'TFZ=4.2', 'PAK=0', 'LLG=30', 'SOLU', '6DIM', 'ENSE', '200la_',
        'EULER', '329.492', '34.325', '209.775', 'FRAC', '0.70297', '0.00106', '-0.24023',
    'SOLU', 'SET', 'RFZ=3.6', 'TFZ=3.6', 'PAK=0', 'LLG=30', 'SOLU', '6DIM', 'ENSE', '200la_', 
        'EULER', '177.344', '78.287', '187.356', 'FRAC', '0.04890', '0.00090', '-0.57497'
    ]

there is some chance that people will notice the structure (EULER followed by three numbers then FRAC followed by three numbers) repeated and go "Oho, six numbers per line in his file" and come back with some more useful advice, like:

Start at the beginning, tell us what your file structure is. There is likely to be a better way of getting your information than smashing your file into a list of strings and then attempting to recover from that.

Update In the meantime, here is an answer that uses the structure that is evident in your data and comments and will be more debuggable if there are variations in the structure:

TAG0 = 'EULER'
TAG1 = 'FRAC'

def extract_rows(tokens):
    pos = 0
    while True:
        try:
            pos = tokens.index(TAG0, pos)
        except ValueError:
            return
        assert pos + 8 <= len(tokens)
        assert tokens[pos+4] == TAG1
        yield (
            tuple(map(float, tokens[pos+1:pos+4])),
            tuple(map(float, tokens[pos+5:pos+8])),
            )
        pos += 8

for rowx, row in enumerate (extract_rows(alist)):
    print rowx, 'TAG0', row[0]
    print rowx, 'TAG1', row[1]

Results:

0 TAG0 (321.99700000000001, 124.066, 234.744)
0 TAG1 (-0.14681, 0.50244999999999995, -0.05722)
1 TAG0 (329.49200000000002, 34.325000000000003, 209.77500000000001)
1 TAG1 (0.70296999999999998, 0.00106, -0.24023)
2 TAG0 (177.34399999999999, 78.287000000000006, 187.35599999999999)
2 TAG1 (0.048899999999999999, 0.00089999999999999998, -0.57496999999999998)

Update 2 Based on your example file, the following simple code (untested) should do what you want:

for line in open('my_file.txt'):
    row = line.split()
    if row[0] == 'SOLU' and row[1] == '6DIM' and row[4] == 'EULER' and row[8] == 'FRAC':
        euler = map(float, row[5:8])
        frac = map(float, row[9:12])
        do_something_with(euler, frac)

Note: it's only a coincidence that what you are looking for is "all of the floating point numbers" (which ignores the floating point numbers in RFZ=3.0 TFZ=4.7 anyway!). What you have is a file with STRUCTURE: two types of SOLU records, and you want the 3 numbers that appear after EULER and the 3 after FRAC in the SOLU 6DIM records. You DON'T want a list of all of those numbers and have to split them up again into (3 EULER numbers and 3 FRAC numbers) times N.

John Machin 2010-02-19 23:27:15

Thanks for the tip! Instead of splitting my file into tons of strings, would I be able to use this method by reading through the file? i.e. `for line in file`? UPDATE: I've updated to post to include the file itself, rather than a list of strings.

steve 2010-02-21 14:37:59

Excellent, thank you sir.

steve 2010-02-21 15:51:11

ansaurus

tags:

views:

answers:

How to I extract floats from a file in Python?

related questions