I have an interesting genetics problem that I would like to solve in native Python (nothing outside the standard library). This in order for the solution to be very easy to use on any computer, without requiring the user to install additional modules.
Here it is. I received 100,000s of DNA sequences (up to 2 billion) from a 454 new generation sequencing run. I want to trim the extremities in order to remove primers that may be present on both ends, both as normal and sense sequences. Example:
seq001: ACTGACGGATAGCTGACCTGATGATGGGTTGACCAGTGATC
--primer-1--- --primer-2-
Primers can be present one or multiple times (one right after the other). Normal sense are always on the left, and reverse on the right. My goal is thus to find the primers, cut the sequence such that only the primer-free part remains. For this, I want to use a classic alignment algorithm (ie: Smith-Waterman) that has been implemented in native Python (ie: not through biopython). I am aware of the fact that this may require quite some time (up to hours).
Note: This is NOT a direct "word" search, as DNA, both in the sequences and the primers, can be "mutated" for diverse technical reasons.
What would you use?