Hello,
I'm trying to make a glob-like expansion of a set of DNA strings that have multiple possible bases.
The base of my DNA strings contains the letters A, C, G, and T. However, I can have special characters like M which could be an A or a C.
For example, say I have the string:
ATMM
I would like to take this string as input and output the four possible matching strings:
ATAA
ATAC
ATCA
ATCC
Rather than brute force a solution, I feel like there must be some elegant Python/Perl/Regular Expression trick to do this.
Thank you for any advice.
Edit, thanks cortex for the product operator. This is my solution:
Still a Python newbie, so I bet there's a better way to handle each dictionary key than another for loop. Any suggestions would be great.
import sys
from itertools import product
baseDict = dict(M=['A','C'],R=['A','G'],W=['A','T'],S=['C','G'],
Y=['C','T'],K=['G','T'],V=['A','C','G'],
H=['A','C','T'],D=['A','G','T'],B=['C','G','T'])
def glob(str):
strings = [str]
## this loop visits very possible base in the dictionary
## probably a cleaner way to do it
for base in baseDict:
oldstrings = strings
strings = []
for string in oldstrings:
strings += map("".join,product(*[baseDict[base] if x == base
else [x] for x in string]))
return strings
for line in sys.stdin.readlines():
line = line.rstrip('\n')
permutations = glob(line)
for x in permutations:
print x