ansaurus

Question

Python: Indexing a file that is tab delimited

Answer 1

+2 A:

z = open('output.blast', 'r')
for line in z.readlines():
    cols = line.split('\t'):
        print cols[1]
z.close()

You need to split() the line on tab characters first.

Alternatively, you could use Python's csv module in tab-delimiters mode.

Amber 2010-07-01 17:23:48

This will print out the second letter in every column - this is not what you intent I am sure.

Dave Kirby 2010-07-01 18:10:49

Whoops. The danger of copy paste. :) Fixed.

Amber 2010-07-01 18:11:34

Answer 2

+3 A:

Check out the csv module. That should help you a lot if you plan on doing more stuff with your tab-delimited files, too. One nice thing is that you can assign names to the various columns.

JAB 2010-07-01 17:24:01

Answer 3

+1 A:

import csv,StringIO
text="""1_0 NP_045689   100.00  279 0   0   18  296 18  296 3e-156  539
1_0 NP_045688   54.83   259 108 6   45  296 17  273 2e-61   224"""

f = csv.reader(StringIO.StringIO(text), delimiter='\t')
for row in f:
    print row[1]

two things of note:

the delimiter argument to the reader method tells the csv module how to split the text line. Check the other arguments to the reader function to extend functionality (ie: quotechar)

I use StringIO to wrap the text example as a file object, you dont need that if you are using a file reference.

ex:

f=csv.reader(open('./test.csv'),delimiter='\t')

ebt 2010-07-01 18:07:10

Answer 4

A:

This is why your code is going wrong:

for col in line:

will iterate over every CHARACTER in the line.

    print col[1]

A character is a string of length 1, so col[1] is always going to give an index out of range error.

As others have said, you either need to split the line on the TAB character '\t', or use the csv module, which will correctly handle quoted fields that may contain tabs or newlines.

I also recommend avoiding using readlines - it will read the entire file into memory, which may cause problems if it is very large. You can iterate over the open file a line at a time instead:

z = open('output.blast', 'r')
for line in z:
    ...

Dave Kirby 2010-07-01 18:18:34

ansaurus

tags:

views:

answers:

Python: Indexing a file that is tab delimited

related questions