tags:

views:

83

answers:

5

I have a file that looks like this(have to put in code box so it resembles file):

text
(starts with parentheses)
         tabbed info
text
(starts with parentheses)
         tabbed info

...repeat

I want to grab only "text" lines from the file(or every fourth line) and copy them to another file. This is the code I have, but it copies everything to the new file:

import sys

def process_file(filename):

    output_file = open("data.txt", 'w')

    input_file = open(filename, "r")
    for line in input_file:
        line = line.strip()
                if not line.startswith("(") or line.startswith(""):
                        output_file.write(line)        
    output_file.close()
if __name__ == "__main__":
process_file(sys.argv[1])
A: 

The reason why your script is copying every line is because line.startswith("") is True, no matter what line equals.

You might try using isspace to test if line begins with a space:

def process_file(filename):
    with open("data.txt", 'w') as output_file:
        with open(filename, "r") as input_file:
            for line in input_file:
                line=line.rstrip()
                if not line.startswith("(") or line[:1].isspace():
                    output_file.write(line) 
unutbu
A: 

In addition to line.startswith("") always being true, line.strip() will remove the leading tab forcing the tabbed data to be written as well. change it to line.rstrip() and use \t to test for a tab. That part of your code should look like:

line = line.rstrip()
if not line.startswith(('(', '\t')):
    #....

In response to your question in the comments:

#edited in response to comments in post
for i, line in input_file:
    if i % 4 == 0:
        output_file.write(line)
aaronasterling
is there a way to make the code only write every 4th line?
Timmay
@Timmay, i've updated my post
aaronasterling
You can change the for part to be "for i, line in enumerate(input_file):". That way you don't need to worry about initializing and separately updating i.
Mr Fooz
Thanks for all of your help. I will just add that I had to add output_file.write('\n') to get it to write every 4th line in a new line. Also, you might want to edit the writeline(line) part out of the second code in case someone else has this question and doesn't catch it. :)
Timmay
A: 

try:

if not line.startswith("(") and not line.startswith("\t"):

without doing line.strip() (this will strip the tabs)

Andre Holzner
he'll still need to use `line.rstrip()` to remove the trailing newline
aaronasterling
A: 

So the issue is that (1) you are misusing boolean logic, and (2) every possible line starts with "".

First, the boolean logic:

The way the or operator works is that it returns True if either of its operands is True. The operands are "not line.startswith('(')" and "line.startswith('')". Note that the not only applies to one of the operands. If you want to apply it to the total result of the or expression, you will have to put the whole thing in parentheses.

The second issue is your use of the startswith() method with a zero-length strong as an argument. This essentially says "match any string where the first zero characters are nothing. It matches any strong you could give it.

See other answers for what you should be doing here.

jcdyer
you're spot on with the boolean logic but you're way off on the second issue. look at the problem again
aaronasterling
Oops. You are correct. Somehow, I had gotten it into my head that the OP was dealing with blank lines. Looks like he found an answer that worked for him, but I'll delete the bit about blank lines. Thanks.
jcdyer
+1  A: 
with open('data.txt','w') as of:
    of.write(''.join(textline
                     for textline in open(filename)
                     if textline[0] not in ' \t(')
             )

To write every fourth line use slice result[::4]

with open('data.txt','w') as of:
    of.write(''.join([textline
                     for textline in open(filename)
                     if textline[0] not in ' \t('][::4])
             )

I need not to rstrip the newlines as I use them with write.

Tony Veijalainen
how do you close the files?
aaronasterling
The files need not be closed as they are not opened normally but used as generator. The garbage collector takes care of 'w' case on exit from function or program. Anyway, I corrected it to use with, which takes care of closing the output file also in case of error.
Tony Veijalainen
Nitpick: the files are opened completely normally. They're closed by the context manager (the `with` statement), which calls the `__exit__` method on the file object when the block of code exits. The file object is then garbage-collected.
katrielalex