tags:

views:

219

answers:

6

Hi,

I need to generate the strings STA and STB.

STA and STB are strings of length 10, and each one can contain only the characters A,T,G or C.

I have to generate all possible combinations of STA, and depending on STA, I generate STB.

The ways is that the character A is always associated with T and viceversa and G with C and viceversa.

so it is possible combinations like:

STA: ATGC...
STB: TACG...

or

STA: GTTA...
STB: CAAT...

and so on.

I wonder what would be the best way of doing this using bash or python

Thanks

+2  A: 

I'd say Python.

Have a look here for string permutations: Permutations using a Combinations Generator (Python). Another thing to look at is itertools in Python 2.6+ - Generating all permutations of a list in python. I do note however that your requirements are more in depth, however you will probably find it easier to add in the necessary constraints in Python rather than Bash.

Simple, clean and easy.

Now, I'm not expert on Bash, but looking at it, you would have to have multiple lines that repeat pretty much the same text over and over depending on your combinations. It would be great to use simple combinations, but not linked combinations.

Kyle Rozendo
Reasons? I'm not arguing, but just saying "Python" doesn't really help anyone. I could equally say "I'd say Bash" and someone who doesn't know enough (like the asking) would have no way of determining between the 2.
Dan McGrath
I think you spoke too soon. ;)
Kyle Rozendo
+2  A: 

While I don't know bash and don't see how permutations would solve your problem, it seems that itertools.product is a fairly straightforward way to do this:

>>> s = 'atgc'
>>> d = dict(zip(s, 'tacg'))
>>> import itertools
>>> for i in itertools.product(s, repeat=10):
    sta = ''.join(i)
    stb = ''.join(d[x] for x in i)

while proposed method is valid in terms of obtaining all possible permutations with replacement of the 'atgc' string, i.e., finding sta string, finding stb would be more efficient not through the dictionary look-up, but rather the translation mechanism:

>>> trans = str.maketrans(s, 'tacg')
>>> for i in itertools.product(s, repeat=10):
    sta = ''.join(i)
    stb = sta.translate(trans)

Thanks to Dave, for highlighting more efficient solution.

SilentGhost
why do i get?: for i in itertools.product(s, repeat=10):AttributeError: 'module' object has no attribute 'product'
Werner
@Werner: because you're using old version of Python, as Kyle has mentioned you need to have python 2.6 at least.
SilentGhost
ok great, now it runs! thanksjust a short comment, sta and stb are 10 characters long, how can i change them to 5?
Werner
ok, just change repeat from 10 to 5, sorry i should sleep more :-Pthanks
Werner
The code as written yields sta as 'cccccccccc' and stb as 'gggggggggg'.
Seth
@Seth: and? that's exactly what OP asked for. STB supposed to correspond to STA, the way it's done in the question, not the way it's done in your answer.
SilentGhost
A: 

Unrelated to your actual question but related to what you're (apparently) doing, have you checked out BioPython?

kwatford
good point! thanks
Werner
+1  A: 

Here you go:

>>> from itertools import product
>>> seq = ("AGCT",) * 10
>>> STA = [''.join(a) for a in product(*seq)]
>>> STB = list(reversed(STA))

Incidentally, len(STA) is 220.

itertools.product is available in Python 2.6.

See @hop's answer here for an implementation of product in Python 2.5

Seth
+2  A: 

Others have said how to generate STA.

The most efficient way to convert a string STA into the equivalent string STB is to use the string translate & maketrans functions.

>>> import string
>>> s = "AGTC" * 100
>>> trans = string.maketrans("ATGC", "TACG")
>>> s.translate(trans)
'TCAG...TCAG'

On my system this is ~100 times faster than doing a dictionary lookup on each character as suggested by SilentGhost.

Dave Kirby
+1  A: 

bash baby :)

STA=$(echo {A,C,T,G}{A,C,T,G}{A,C,T,G}{A,C,T,G}{A,C,T,G}{A,C,T,G}{A,C,T,G}{A,C,T,G}{A,C,T,G}{A,C,T,G})
STB=$(echo $STA | tr ATCG TAGC)

echo $STA
echo $STB
frankc