I'd probably do this:
- iterate over lines in the output
- search for one containing
eng_tot
:
if 'eng_tot' in line.split(): process_blocks
- gobble up lines until one matches all dashes (with optional spaces on either side)
if re.match("\s+-+\s+", line): proccess_metrics_block
- process the first line of metrics:
- cut the first column off the line (it makes it harder to parse, because it might not be there)
sanitized_line = line[8:]
eng_total = line.split()[0]
, the first column is now eng_total
- skip lines until you reach another line of dashes, then start again
After seeing your edits:
- You need to import the
re
(regular expression) module, at the top of the file : import re
- The
process_blocks
and process_metrics_block
were pseudo code. Those don't exist unless you define them. :) You don't need those functions exactly, you can avoid them using basic looping (while
) and conditional (if
) statements.
- You'll have to make sure you understand what you're doing, not just copy from stack overflow! :)
It looks like you're trying to do something like this. It seems to work, but I'm sure with some effort, you can come up with something nicer:
import re
def find_header(lines):
for (i, line) in enumerate(lines):
if 'eng_tot' in line.split():
return i
return None
def find_next_separator(lines, start):
for (i, line) in enumerate(lines[start+1:]):
if re.match("\s*-+\s*", line):
return i + start + 1
return None
if __name__ == '__main__':
totals = []
lines = open('so.txt').readlines()
header = find_header(lines)
start = find_next_separator(lines, header+1)
while True:
end = find_next_separator(lines, start+1)
if end is None: break
# Pull out block, after line of dashes.
metrics_block = lines[start+1:end]
# Pull out 2nd column from 1st line of metrics.
eng_total = metrics_block[0].split()[1]
totals.append(eng_total)
start = end
print totals
You can use a generator to be a little more pythonic:
def metric_block_iter(lines):
start = find_next_separator(lines, find_header(lines)+1)
while True:
end = find_next_separator(lines, start+1)
if end is None: break
yield (start, end)
start = end
if __name__ == '__main__':
totals = []
lines = open('so.txt').readlines()
for (start, end) in metric_block_iter(lines):
# Pull out block, after line of dashes.
metrics_block = lines[start+1:end]
# Pull out 2nd column from 1st line of metrics.
eng_total = metrics_block[0].split()[1]
totals.append(eng_total)
print totals