ansaurus

Question

DNA sequence alignement in native Python (no biopython)

Answer 1

+1 A:

Here's a paper on approximately that subject:

Rocke, On finding novel gapped motifs in DNA sequences, 1998.

Hopefully from that paper and its references, plus other papers which cite the above, you can find many ideas for algorithms. You won't find python code, but you may find descriptions of algorithms which you could then implement in Python.

Heath Hunnicutt 2010-03-10 19:55:25

Thank you Heath. I'm really, however, looking for a Python implementation :) Cheers!

Morlock 2010-03-10 19:57:59

Answer 2

+1 A:

Researching that algorithm briefly, this is not easy stuff. This is going to take some very serious algorithm work. Try re-aligning your expectations from "hours" to "days or weeks".

The programmer implementing this will need:

High competence in general python programming
Algorithm programming experience, and a good understanding of time complexity.
A good understanding of python data structures such as dict, set, and deque, and their complexity characteristics.
Familiarity with unittests.

That programmer may or may not be you right now. This sounds like an awesome project, good luck!

Christian Oudard 2010-03-10 23:04:07

@Christian Oudard The time I hinted at (hours) was referring to the time the algorithm might take, not how long it would take to create it :) From what I have found, I have rather decided to plunge deeper into the realm of using available (and quality) tools that already exist in the field of genetics. I give you the 'answer' since you finish nailing the nail I had half nailed myself while reflecting on the appropriatedness of reinventing the wheel here. Cheers!

Morlock 2010-03-11 02:48:41

Answer 3

A:

You could do this quite simply using regex? I don't think it would be that complicated! In fact, I have just completed some code to do something pretty much the same as this for one of the guys at the university here!

If not looking for exact copies of the primers, due to mutation then an element of fuzzy matching could be applied! The version I did very simply looked for exact primer matches at the start and end and returned the value minus those primers using the following code:

pattern = "^" + start_primer + "([A-Z]+)" + end_primer + "$" # start primer and end primer are sequences you are looking to match
regex = re.match(pattern, sequence) # sequence is the DNA sequence you are analyzing
print regex.group(1) # prints the sequence between the start and end primers

Here's a link on fuzzy regex in python http://hackerboss.com/approximate-regex-matching-in-python/

Steve 2010-09-10 08:01:41

ansaurus

tags:

views:

answers:

DNA sequence alignement in native Python (no biopython)

related questions