If you display your input in a manner that discourages answerers from examining its structure, and you ask questions like "how do I extract only the floating point numbers", and bury useful information like "In each file that I am looking at, there is 6 numbers like that in each line" in comments, you will get knee-jerk answers providing exactly what you asked for: a list of "floats" that includes 3 spurious numbers (1.0, 21.0, and 1.0) at the front of the list.
If you display your data in a slightly more congenial fashion, like:
alist = [
'#', '3e98.mtz', 'MR_AUTO', 'with', 'model', '200la_.pdb', 'SPACegroup', 'HALL', 'P', '2yb',
'#P', '1', '21', '1',
'SOLU', 'SET', 'RFZ=3.0', 'TFZ=4.7', 'PAK=0', 'LLG=30', 'SOLU', '6DIM', 'ENSE', '200la_',
'EULER', '321.997', '124.066', '234.744', 'FRAC', '-0.14681', '0.50245', '-0.05722',
'SOLU', 'SET', 'RFZ=3.3', 'TFZ=4.2', 'PAK=0', 'LLG=30', 'SOLU', '6DIM', 'ENSE', '200la_',
'EULER', '329.492', '34.325', '209.775', 'FRAC', '0.70297', '0.00106', '-0.24023',
'SOLU', 'SET', 'RFZ=3.6', 'TFZ=3.6', 'PAK=0', 'LLG=30', 'SOLU', '6DIM', 'ENSE', '200la_',
'EULER', '177.344', '78.287', '187.356', 'FRAC', '0.04890', '0.00090', '-0.57497'
]
there is some chance that people will notice the structure (EULER followed by three numbers then FRAC followed by three numbers) repeated and go "Oho, six numbers per line in his file" and come back with some more useful advice, like:
Start at the beginning, tell us what your file structure is. There is likely to be a better way of getting your information than smashing your file into a list of strings and then attempting to recover from that.
Update In the meantime, here is an answer that uses the structure that is evident in your data and comments and will be more debuggable if there are variations in the structure:
TAG0 = 'EULER'
TAG1 = 'FRAC'
def extract_rows(tokens):
pos = 0
while True:
try:
pos = tokens.index(TAG0, pos)
except ValueError:
return
assert pos + 8 <= len(tokens)
assert tokens[pos+4] == TAG1
yield (
tuple(map(float, tokens[pos+1:pos+4])),
tuple(map(float, tokens[pos+5:pos+8])),
)
pos += 8
for rowx, row in enumerate (extract_rows(alist)):
print rowx, 'TAG0', row[0]
print rowx, 'TAG1', row[1]
Results:
0 TAG0 (321.99700000000001, 124.066, 234.744)
0 TAG1 (-0.14681, 0.50244999999999995, -0.05722)
1 TAG0 (329.49200000000002, 34.325000000000003, 209.77500000000001)
1 TAG1 (0.70296999999999998, 0.00106, -0.24023)
2 TAG0 (177.34399999999999, 78.287000000000006, 187.35599999999999)
2 TAG1 (0.048899999999999999, 0.00089999999999999998, -0.57496999999999998)
Update 2 Based on your example file, the following simple code (untested) should do what you want:
for line in open('my_file.txt'):
row = line.split()
if row[0] == 'SOLU' and row[1] == '6DIM' and row[4] == 'EULER' and row[8] == 'FRAC':
euler = map(float, row[5:8])
frac = map(float, row[9:12])
do_something_with(euler, frac)
Note: it's only a coincidence that what you are looking for is "all of the floating point numbers" (which ignores the floating point numbers in RFZ=3.0 TFZ=4.7
anyway!). What you have is a file with STRUCTURE: two types of SOLU records, and you want the 3 numbers that appear after EULER and the 3 after FRAC in the SOLU 6DIM records. You DON'T want a list of all of those numbers and have to split them up again into (3 EULER numbers and 3 FRAC numbers) times N.