My assignment ask to make a function call readFasta that accepts one argument: the name of a fasta format file (fn) containing one or more sequences. The function should read the file and return a dictionary where the keys are the fasta headers and the values are the corresponding sequences from file fn converted to strings. Make sure that you don’t include any new lines or other white space characters in the sequences in the dictionary.
For ex, if afile.fa looks like:
>one
atctac
>two
gggaccttgg
>three
gacattac
then the a.readFasta(f) returns:
[‘one’ : ‘atctac’,
‘two’ : ‘gggaccttgg’,
‘three’: ‘gacattac’]
If have tried to write some codes but as I am totally newbie in programming, it didnt work out very much for me. Can everyone please help me. Thank you so much. Here are my codes:
import gzip
def readFasta(fn):
if fn.endswith('.gz'):
fh = gzip.gzipfile(fn)
else:
fh = open(fn,'r')
d = {}
while 1:
line = fh.readline()
if not line:
fh.close()
break
vals = line.rstrip().split('\t')
number = vals[0]
sequence = vals[1]
if d.has_key(number):
lst = d[number]
if gene not in lst:
# this test may not be necessary
lst.append(sequence)
else:
d[number] = [sequence]
return d
Here is what I got in my afile.txt
one atctac
two gggaccttgg
three gacattac