views:

150

answers:

1

My assignment ask to make a function call readFasta that 
accepts 
one 
argument:
the
 name 
of 
a
 fasta
 format 
file
(fn) 
containing 
one 
or 
more 
sequences.
The 
function 
should 
read
 the 
file 
and
 return 
a
 dictionary 
where 
the 
keys 
are 
the 
fasta 
headers 
and 
the 
values
 are 
the 
corresponding 
sequences 
from 
file 
fn 
converted 
to 
strings.
 Make 
sure 
that
 you 
don’t 
include 
any 
new 
lines 
or 
other 
white space 
characters 
in 
the
 sequences 
in
 the 
dictionary.

For ex, if afile.fa looks like:

>one
atctac
>two
gggaccttgg
>three
gacattac

then the a.readFasta(f) returns:

[‘one’ : ‘atctac’,
‘two’ : ‘gggaccttgg’,
‘three’: ‘gacattac’]

If have tried to write some codes but as I am totally newbie in programming, it didnt work out very much for me. Can everyone please help me. Thank you so much. Here are my codes:

import gzip

def readFasta(fn):
    if fn.endswith('.gz'):
        fh = gzip.gzipfile(fn)
    else:
        fh = open(fn,'r')

    d = {}

    while 1:
        line = fh.readline()

        if not line:
            fh.close()
            break

        vals = line.rstrip().split('\t')
        number = vals[0]
        sequence = vals[1]

        if d.has_key(number):
            lst = d[number]

            if gene not in lst:
                # this test may not be necessary
                lst.append(sequence)
        else:
            d[number] = [sequence]

    return d

Here is what I got in my afile.txt

one atctac

two gggaccttgg

three gacattac

+1  A: 

your post is slightly confusing. I assume that you want it to return a dict. in that case, you would write it as {'one': 'actg', 'two': 'aaccttgg' }. if you correctly presented the file format, then this function should do the trick.

import gzip

def read_fasta(filename):
    with gzip.open(filename) as f:
        return dict(line.split() for line in f)
aaronasterling
I just used the code but still got error with python 2.6 version:
pmt0512
>>> import readFasta as b>>> b.readFasta('afile.txt')Traceback (most recent call last): File "<stdin>", line 1, in <module> File "readFasta.py", line 37, in readFasta IndexError: list index out of range
pmt0512
@pmt0512. if you put it in a file by itself, that would help. I don't know what line the code I gave you starts at in that file. It should be in a file nine lines long for reference. As it stands though, there's no place in my code (AKAIK) that could yield an index error.
aaronasterling
@pmt0512. Also, msw and GWW have a good point.
aaronasterling
one atctactwo gggaccttggthree gacattac
pmt0512
AH, I dont know how to edit the file in comment to make it look exactly, but it contain 3 lines, first line will be one, tab, atctac, the 2nd line will be two, tab, ggaccttgg, and 3rd line is three, tab, gacattac. I just looking to make the function returns when it read this file as the dict that the assignment give as example. Thanks
pmt0512
@pmt0512. you should edit your post then because that's a totally different format then what you have displayed right now.
aaronasterling
I have up the afile.txt up
pmt0512
@pmt0512. I just noticed that you uploaded the file. I updated my solution. Are you sure that it's in a zip file and not just a text file?
aaronasterling
it actually just the text file, I am not use much about the gzip command line, I just put it there as someone told me that just fine to put it there, so I dont know, should I take if off? Thanks AaronMcSmooth
pmt0512