tags:

views:

178

answers:

1

So I got a DNA sequence.

ACCAGAGCGGCACAGCAGCGACATCAGCACTAGCACTAGCATCAGCATCAGCATCAGC
CTACATCATCACAGCAGCATCAGCATCGACATCAGCATCAGCATCAGCATCGACGACT
ACACCCCCCCCGGTGTGTGTGGGGGGTTAAAAATGATGAGTGATGAGTGAGTTGTGTG
CTACATCATCACAGCAGCATCAGCATCGACATCAGCATCAGCATCAGCATCGACGACT
TTCTATCATCATTCGGCGGGGGGATATATTATAGCGCGCGATTATTGCGCAGTCTACG
TCATCGACTACGATCAGCATCAGCATCAGCATCAGCATCGACTAGCATCAGCTACGAC

I need to count the bases.

Also for some reason it can sometimes it can alternate between upper or lowercase in the same string.

+7  A: 
for base in 'ACGT':
  print base, thesequence.count(base) + thesequence.count(base.lower())
Alex Martelli
Thank you sire wholeheartedly.
Joshua
Out of curiosity, is there a reason you don't do thesequence.lower().count(base.lower()), instead? I'm guessing it's to make it faster, but I'm not 100% sure.
Edan Maor
It's not necessarily faster this way, but it takes less memory. Since DNA sequences can be **long** this can be important.
sth
Yep, as you need to do two passes anyway, it's better to have both be counting ones (memory-thrifty) rather than have one take up O(N) extra temporary memory. If you do have memory to spare, a single `tmp = sequence.lower()` outside the loop (then loop over `'acgt'` in lowercase doing just `tmp.count(base)`) is going to be faster. A single pass with a finditer on a case-insensitive RE might be fastest, but **a lot** less simple than these approaches;-).
Alex Martelli