ansaurus

Question

How to print out the line after the line found in re.compile()

Answer 1

A:

i is the index into lines that line is at, so i+1 is the next line:

print lines[i+1]

Make sure the ---- isn't the last line or this will try to read from a location that doesn't exist. Also, your regular expression \s+-+\s+ requires that there be spaces before and after the -s, as \s+ means 1 or more spaces; you probably meant \s*

Michael Mrozek 2010-06-21 18:10:45

Thank you for your help with the lines, but for the regexp, considering the file I have, I wanted the '+'s. Now I have a bunch of columns of data, how do I take out the second item in each row?

Maimon 2010-06-21 18:28:10

@Maimon Write a regular expression that matches the line and has a group around the part you want (e.g. `[^ ]* *([^ ])*`), and use `match.group(1)` to extract that group

Michael Mrozek 2010-06-21 18:48:30

I'm not very good with regexp (I got someone else to get me this far). I'll post a sample line in the question, and you can tell me what you think I should do.

Maimon 2010-06-21 18:54:39

Answer 2

+2 A:

You could use a single regex:

file = open('FilePath/OUTPUT.01')
lines = file.read()
file.close()
with open("output.txt","w") as f:
    for match in re.finditer(r"(?m)^\s*-+\s+\S+\s+(-?[\d.]+E[+-]\d+)", lines):
        f.write(match.group(1)+"\n")

This should write all the second numbers that occur after a line that consists entirely of - into the file output.txt.

This regex assumes that the columns are space-separated, and that the first column will never be empty.

Explanation:

(?m)                 # allow ^ to match at start of line, not just start of string
^                    # anchor the search at the start of the line
\s*                  # match any leading whitespace
-+                   # match one or more dashes
\s+                  # match trailing whitespace, including linebreak characters
\S+                  # match a run of non-whitespace characters (we're now one line ahead of the dashes
\s+                  # match a run of whitespace
(-?[\d.]+E[+-]\d+)   # match a number in scientific notation

Tim Pietzcker 2010-06-21 18:35:55

unfortunately the second numbers are not integers. They are in scientifice notation: i.e. '-1.1287E+05'

Maimon 2010-06-21 18:39:46

OK, we need a few samples to figure out how to construct the regex then.

Tim Pietzcker 2010-06-21 18:55:19

OK sample figures are up

Maimon 2010-06-21 19:01:31

Here's the error I get when I run that:Traceback (most recent call last): File "LineExtract4", line 12, in ? for match in re.finditer(r"(?m)^\s*-+\s+\S+\s+(\S+)", lines): File "/usr/lib/python2.4/sre.py", line 176, in finditer return _compile(pattern, flags).finditer(string)TypeError: expected string or buffer

Maimon 2010-06-21 19:11:54

Ah, sorry, forgot to switch from `readlines()` to `read()`. Corrected.

Tim Pietzcker 2010-06-21 19:19:56

@ TimThank you very much, now the only thing left is to write the output to a file, and take out any lines that aren't numbers. I'll post in the question what I'm talking about.

Maimon 2010-06-21 19:32:49

Actually ignore that last comment, I can do that in Bash. I'd just like to know now what exactly the regex you posted means, and how I could change it so that it prints out any piece of data relative to the line of '-'s instead of just the first datum on the first line. And if possible, could it prompt you for the relative position? (I just think that would be nice considering I'm going to have to run this a large number of times for many different data points). Thank you so much.

Maimon 2010-06-21 20:03:10

@Maimon: If you need to handle many different data points, regexes will become quite unwieldy. I suggest you ask another question specifying exactly what you need, providing enough examples for all foreseeable use cases, and we'll get right on it.

Tim Pietzcker 2010-06-22 06:02:13

Actually this regex appears to work just fine, I just add in a chain of "\S+\s+"s to search for which ever data point I want. However, I was wondering if there was a more efficient way to do it (i.e. skip a whole line at a time, or search for 'x' words down the line).

Maimon 2010-06-22 16:03:43

As I said, I suggest you put that into another question. This one is becoming overloaded :)

Tim Pietzcker 2010-06-22 16:21:38

Answer 3

A:

I wouldn't bother with REs for this. Try the following:

output = file("tmp.txt", "w")        # open a file for writing
flagged = False                      # when 'flagged == True' we will print the line
for line in file("FilePath/OUTPUT.01"):
    if flagged:
        try:
            result = line.split()[1] # python is zero-indexed!
            print>>output, result    # print to output only if the split worked
        except IndexError:           # otherwise do nothing
            pass
        flagged = False              # but reset the flag
    else:
        if set(line.strip()) == set(["-"]): # does the line consist only of '-'?
            flagged = True           # if so, set the flag to print the next line

Here's a version which allows you to specify the number of lines offset, and the column number:

OFFSET = 3 # the third line after the `----`
COLUMN = 2 # column index 2

output = file("tmp.txt", "w")
counter = 0                           # 0 evaluates as False
for line in file("FilePath/OUTPUT.01"):
    if counter:                       # any non-zero value evaluates as True
        if counter == OFFSET:
            try:
                result = line.split()[COLUMN] 
                print>>output, result # print to output only if the split worked
            except IndexError:        # otherwise do nothing
                pass
            counter = 0               # reset the flag once you've reached the OFFSET line
        else:
            counter += 1
    else:
        if set(line.strip()) == set(["-"]): # does the line consist only of '-'?
            counter = 1

Michael Dunn 2010-06-21 19:20:29

By the way, editing your question for clarity is a good idea, but careful you don't change the question so much that people's answers don't apply any more!

Michael Dunn 2010-06-21 19:42:41

After running this code I get the following error:Traceback (most recent call last): File "LineExtract5", line 5, in ? print>>output, line.split()[1]IndexError: list index out of rangeI think that may be because some of the lines that were found have no members.

Maimon 2010-06-21 19:50:12

What do you want it to do if there are no numbers in the line after the "----" line? If you wanted to ignore it you could put `if line.strip():` before the print line, or you could wrap it in a `try: ... except IndexError: ...` block.

Michael Dunn 2010-06-21 19:59:10

I tried both of those methods, but I couldn't get it to run right. Where exactly should I put those lines, because I do want to ignore lines w/o numbers.

Maimon 2010-06-22 15:00:07

OK, I've added a `try`/`except` block above.

Michael Dunn 2010-06-22 16:02:38

OK, that worked. Now how would I edit it in order to get, for example, a data point on the second or third line after the ----s (instead of the line immediately after it)?

Maimon 2010-06-22 16:49:33

Easy! Instead of using a `True`/`False` flag, use a counter. Start off with `flagged = 0`, then each time you spot a line consisting of only `-`, increment it by one, e.g. `flagged = flagged + 1` (or use the increment operator, `flagged += 1`). Then instead of testing for `if flagged:`, test for the line number you want, e.g. `if flagged == 2` (or 3 or whatever)

Michael Dunn 2010-06-22 20:17:43

I think you might have misunderstood what I was asking. I was looking for a way to find the data point 'x' lines below and 'y' points to the right of each instance of '----'s. I tried setting OUTPUT.01 as a file object then sticking in line = f.next() right after the 'try' line, but it didn't change anything.

Maimon 2010-06-23 14:11:17

Using the counter will find you the line you need, then you just need to change the index in result = line.split()[1] to whatever column you want. I've made a new version (untested) for comparison.

Michael Dunn 2010-06-23 15:53:20

----------------------

Maimon 2010-06-24 14:18:36

ansaurus

tags:

views:

answers:

How to print out the line after the line found in re.compile()

related questions