dna-sequence

Generating Synthetic DNA Sequence with Subtitution Rate

Given these inputs: my $init_seq = "AAAAAAAAAA" #length 10 bp my $sub_rate = 0.003; my $nof_tags = 1000; my @dna = qw( A C G T ); I want to generate: One thousand length-10 tags Substitution rate for each position in a tag is 0.003 Yielding output like: AAAAAAAAAA AATAACAAAA ..... AAGGAAAAGA # 1000th tags Is there a compact ...

Looking for elegant glob-like DNA string expansion

Hello, I'm trying to make a glob-like expansion of a set of DNA strings that have multiple possible bases. The base of my DNA strings contains the letters A, C, G, and T. However, I can have special characters like M which could be an A or a C. For example, say I have the string: ATMM I would like to take this string as input and o...

Perl recursion techniques?

I need a bit of help with is this code. I know the sections that should be recursive, or at least I think I do but am not sure how to implement it. I am trying to implement a path finding program from an alignment matrix that will find multiple routes back to the zero value. For example if you excute my code and insert CGCA as the first ...

cluster short, homogeneous strings (DNA) according to common sub-patterns and extract consensus of classes

Task: to cluster a large pool of short DNA fragments in classes that share common sub-sequence-patterns and find the consensus sequence of each class. Pool: ca. 300 sequence fragments 8 - 20 letters per fragment 4 possible letters: a,g,t,c each fragment is structured in three regions: 5 generic letters 8 or more positions of g's...

How can I modify the Smith-Waterman algorithm using substitution matrix to align proteins in Perl?

How can I modify the Smith-Waterman algorithm using a substitution matrix to align proteins in Perl? [citations needed] ...

Generate all possible dna sequences from a few given sets

Hi, I have been trying to wrap my head around this for a while now but have not been able to come up with a good solution. Here goes: Given a number of sets: set1: A, T set2: C set3: A, C, G set4: T set5: G I want to generate all possible sequences from a list of sets. In this example the length of the sequence is 5, but it can be a...

Are there any existing solutions for creating a generic DNA sequence database with a website front end?

I'd like to create an rRNA sequence database with a web front end for the lab I work in. It seems common in biology to want to search a large number of sequences using alignment algorithms such as BLAST and HMMER, so I wondered if there is any existing php/python/rails projects that allow easy creation of a generic sequence database with...

Commercial databases adept in storing biological sequences

Hi Which commercial databases are adept in storing biological sequences like Protein/DNA sequence? Are there any which were designed specifically to store such sequences? cheers ...

python script for robust multi-array average on microarray data

I have tried google with no luck. I have seen some weak references to robust multi-array averaging done with python but no code. I am not so interested in reinventing the wheel. Any suggestions on a python module, script .... If I could find a nice explanation or example of the algorithm I would write a python implementation to share. ...

DNA sequence alignement in native Python (no biopython)

I have an interesting genetics problem that I would like to solve in native Python (nothing outside the standard library). This in order for the solution to be very easy to use on any computer, without requiring the user to install additional modules. Here it is. I received 100,000s of DNA sequences (up to 2 billion) from a 454 new gene...

Search for string allowing for one mismatches in any location of the string, Python

I am working with DNA sequences of length 25 (see examples below). I have a list of 230,000 and need to look for each sequence in the entire genome (toxoplasma gondii parasite) I am not sure how large the genome is but much more that 230,000 sequences. I need to look for each of my sequences of 25 characters example(AGCCTCCCATGATTGAACAG...

How (and where) to get aligned tRNA sequences (and import it into R)

(This is a database / R commands question) I wish (for my thesis work), to import tRNA data into R and have it aligned. My questions are: 1) What resources can I use for the data. 2) What commands might help me with the import/alignment. So far, I found two nice repositories that holds such data: tRNAdb at the University of Leipzig ...

String recurring subsequences and compression

Hi, I'd like to do some kind of "search and replace" algorithm which will, in an efficient manner if possible, identify a substring of a string which occurs more than once and replace all occurrences of that substring with a token. For example, given a string "AbcAdAefgAbijkAblmnAbAb", notice that "A" recurs, so reduce in pass one to "#...

Fast algorithms for finding unique sets in two very long sequences of text

I need to compare the DNA sequences of X and Y chromosomes, and find patterns (composed of around 50-75 base pairs) that are unique to the Y chromosome. Note that these sequence parts can repeat in the chromosome. This needs to be done quickly (BLAST takes 47 days, need a few hours or less). Are there any algorithms or programs in partic...

Compare Multiple Substrings

I'm attempting to write a basic dna sequencer. In that, given two sequences of the same length, it will output the strings which are the same, with a minimal length of 3. So input of abcdef dfeabc will return 1 abc I am not sure how to go about solving the problem. I can compare the two strings, and see if they are completely equal...