tags:

views:

117

answers:

3

Using this code

import re
file = open('FilePath/OUTPUT.01')
lines = file.read()
file.close()
for match in re.finditer(r"(?m)^\s*-+\s+\S+\s+(\S+)", lines):
eng = match.group(1)
open('Tmp.txt', 'w').writelines(eng)
print match.group(1)

I get a column of data that looks like this:

-1.1266E+05
-1.1265E+05
-1.1265E+05
-1.1265E+05
-1.1264E+05
-1.1264E+05
-1.1264E+05
-1.1263E+05
step
-1.1263E+05
-1.1262E+05
-1.1262E+05
-1.1261E+05
-1.1261E+05
-1.1260E+05
-1.1260E+05
-1.1259E+05
step
-1.1259E+05
-1.1258E+05
-1.1258E+05
-1.1258E+05
-1.1257E+05
terminating.
eng_tot
-1.1274E+05
3D

How do I write it a file (Tmp.txt)? As of now it only writes the last line '3D'. Also I'd like to eliminate all the lines that aren't of the form x.xxxxExxx (i.e. just the numbers).

A: 

i is the index into lines that line is at, so i+1 is the next line:

print lines[i+1]

Make sure the ---- isn't the last line or this will try to read from a location that doesn't exist. Also, your regular expression \s+-+\s+ requires that there be spaces before and after the -s, as \s+ means 1 or more spaces; you probably meant \s*

Michael Mrozek
Thank you for your help with the lines, but for the regexp, considering the file I have, I wanted the '+'s. Now I have a bunch of columns of data, how do I take out the second item in each row?
Maimon
@Maimon Write a regular expression that matches the line and has a group around the part you want (e.g. `[^ ]* *([^ ])*`), and use `match.group(1)` to extract that group
Michael Mrozek
I'm not very good with regexp (I got someone else to get me this far). I'll post a sample line in the question, and you can tell me what you think I should do.
Maimon
+2  A: 

You could use a single regex:

file = open('FilePath/OUTPUT.01')
lines = file.read()
file.close()
with open("output.txt","w") as f:
    for match in re.finditer(r"(?m)^\s*-+\s+\S+\s+(-?[\d.]+E[+-]\d+)", lines):
        f.write(match.group(1)+"\n")

This should write all the second numbers that occur after a line that consists entirely of - into the file output.txt.

This regex assumes that the columns are space-separated, and that the first column will never be empty.

Explanation:

(?m)                 # allow ^ to match at start of line, not just start of string
^                    # anchor the search at the start of the line
\s*                  # match any leading whitespace
-+                   # match one or more dashes
\s+                  # match trailing whitespace, including linebreak characters
\S+                  # match a run of non-whitespace characters (we're now one line ahead of the dashes
\s+                  # match a run of whitespace
(-?[\d.]+E[+-]\d+)   # match a number in scientific notation
Tim Pietzcker
unfortunately the second numbers are not integers. They are in scientifice notation: i.e. '-1.1287E+05'
Maimon
OK, we need a few samples to figure out how to construct the regex then.
Tim Pietzcker
OK sample figures are up
Maimon
Here's the error I get when I run that:Traceback (most recent call last): File "LineExtract4", line 12, in ? for match in re.finditer(r"(?m)^\s*-+\s+\S+\s+(\S+)", lines): File "/usr/lib/python2.4/sre.py", line 176, in finditer return _compile(pattern, flags).finditer(string)TypeError: expected string or buffer
Maimon
Ah, sorry, forgot to switch from `readlines()` to `read()`. Corrected.
Tim Pietzcker
@ TimThank you very much, now the only thing left is to write the output to a file, and take out any lines that aren't numbers. I'll post in the question what I'm talking about.
Maimon
Actually ignore that last comment, I can do that in Bash. I'd just like to know now what exactly the regex you posted means, and how I could change it so that it prints out any piece of data relative to the line of '-'s instead of just the first datum on the first line. And if possible, could it prompt you for the relative position? (I just think that would be nice considering I'm going to have to run this a large number of times for many different data points). Thank you so much.
Maimon
@Maimon: If you need to handle many different data points, regexes will become quite unwieldy. I suggest you ask another question specifying exactly what you need, providing enough examples for all foreseeable use cases, and we'll get right on it.
Tim Pietzcker
Actually this regex appears to work just fine, I just add in a chain of "\S+\s+"s to search for which ever data point I want. However, I was wondering if there was a more efficient way to do it (i.e. skip a whole line at a time, or search for 'x' words down the line).
Maimon
As I said, I suggest you put that into another question. This one is becoming overloaded :)
Tim Pietzcker
A: 

I wouldn't bother with REs for this. Try the following:

output = file("tmp.txt", "w")        # open a file for writing
flagged = False                      # when 'flagged == True' we will print the line
for line in file("FilePath/OUTPUT.01"):
    if flagged:
        try:
            result = line.split()[1] # python is zero-indexed!
            print>>output, result    # print to output only if the split worked
        except IndexError:           # otherwise do nothing
            pass
        flagged = False              # but reset the flag
    else:
        if set(line.strip()) == set(["-"]): # does the line consist only of '-'?
            flagged = True           # if so, set the flag to print the next line

Here's a version which allows you to specify the number of lines offset, and the column number:

OFFSET = 3 # the third line after the `----`
COLUMN = 2 # column index 2

output = file("tmp.txt", "w")
counter = 0                           # 0 evaluates as False
for line in file("FilePath/OUTPUT.01"):
    if counter:                       # any non-zero value evaluates as True
        if counter == OFFSET:
            try:
                result = line.split()[COLUMN] 
                print>>output, result # print to output only if the split worked
            except IndexError:        # otherwise do nothing
                pass
            counter = 0               # reset the flag once you've reached the OFFSET line
        else:
            counter += 1
    else:
        if set(line.strip()) == set(["-"]): # does the line consist only of '-'?
            counter = 1
Michael Dunn
By the way, editing your question for clarity is a good idea, but careful you don't change the question so much that people's answers don't apply any more!
Michael Dunn
After running this code I get the following error:Traceback (most recent call last): File "LineExtract5", line 5, in ? print>>output, line.split()[1]IndexError: list index out of rangeI think that may be because some of the lines that were found have no members.
Maimon
What do you want it to do if there are no numbers in the line after the "----" line? If you wanted to ignore it you could put `if line.strip():` before the print line, or you could wrap it in a `try: ... except IndexError: ...` block.
Michael Dunn
I tried both of those methods, but I couldn't get it to run right. Where exactly should I put those lines, because I do want to ignore lines w/o numbers.
Maimon
OK, I've added a `try`/`except` block above.
Michael Dunn
OK, that worked. Now how would I edit it in order to get, for example, a data point on the second or third line after the ----s (instead of the line immediately after it)?
Maimon
Easy! Instead of using a `True`/`False` flag, use a counter. Start off with `flagged = 0`, then each time you spot a line consisting of only `-`, increment it by one, e.g. `flagged = flagged + 1` (or use the increment operator, `flagged += 1`). Then instead of testing for `if flagged:`, test for the line number you want, e.g. `if flagged == 2` (or 3 or whatever)
Michael Dunn
I think you might have misunderstood what I was asking. I was looking for a way to find the data point 'x' lines below and 'y' points to the right of each instance of '----'s. I tried setting OUTPUT.01 as a file object then sticking in line = f.next() right after the 'try' line, but it didn't change anything.
Maimon
Using the counter will find you the line you need, then you just need to change the index in result = line.split()[1] to whatever column you want. I've made a new version (untested) for comparison.
Michael Dunn
----------------------
Maimon