bioinformatics

How (and where) to get aligned tRNA sequences (and import it into R)

(This is a database / R commands question) I wish (for my thesis work), to import tRNA data into R and have it aligned. My questions are: 1) What resources can I use for the data. 2) What commands might help me with the import/alignment. So far, I found two nice repositories that holds such data: tRNAdb at the University of Leipzig ...

Faster way to split a string and count characters using R?

I'm looking for a faster way to calculate GC content for DNA strings read in from a FASTA file. This boils down to taking a string and counting the number of times that the letter 'G' or 'C' appears. I also want to specify the range of characters to consider. I have a working function that is fairly slow, and it's causing a bottleneck...

R optimization: How can I avoid a for loop in this situation?

I'm trying to do a simple genomic track intersection in R, and running into major performance problems, probably related to my use of for loops. In this situation, I have pre-defined windows at intervals of 100bp and I'm trying to calculate how much of each window is covered by the annotations in mylist. Graphically, it looks somethi...

Changing the x-axis of seqlogo figures in MATLAB

I'm making a large number of seqlogos programmatically. They are hundreds of columns wide and so running a seqlogo normally creates letters that are too thin to see. I've noticed that I only care about a few of these columns (not necessarily consecutive columns) ... most are noise but some are highly conserved. I use something like this...

Why is Perl used so extensively in Biology research?

I work as support staff in a Biology research institute as a student, and Perl seems to be used everywhere. Not for every single project, but it seems that more than half the people here have a few Perl books in/on their office/desk. Why is Perl used so much in Biology? ...

How can I save BioPerl sequence nested features in genbank or embl format?

EDIT: Please close this question. I asked and got an answer for it on BioStar here. In BioPerl, a sequence object can have any number of features, and each of these can have subfeatures nested within them. For example, a feature may be a complete coding sequence of a gene, and its subfeatures might be individual exons that ar...

How can I extract DNA sequence using a Perl script from UCSC if I have their coordinates?

How can I extract DNA sequence using a Perl script from genome browser (UCSC), if I have their coordinates? ...

Is there a Boost (or other common lib) type for matrices with string keys?

I have a dense matrix where the indices correspond to genes. While gene identifiers are often integers, they are not contiguous integers. They could be strings instead, too. I suppose I could use a boost sparse matrix of some sort with integer keys, and it wouldn't matter if they're contiguous. Or would this still occupy a great deal of...

Refining data stored in SQLite - how to join several contacts?

I'm storing contacts between different elements. I want to eliminate elements of certain type and store new contacts of elements which were interconnected by the eliminated element. Problem background Imagine this problem. You have a water molecule which is in contact with other molecules (if the contact is a hydrogen bond, there can b...

Parse large XML file w/ script or use BioPython API ?

Hey guys this is my first question on here. I'm trying to make a local copy of the UniprotKB in SQL. The UniprotKB is 2.1GB, and it comes in XML and a special text format used by SwissProt Here are my options: 1) Use a SAX parser (XML) - I chose Ruby, and Nokogiri. I started writing the parser, but my initial reaction: how would I map...

Technologies used in EMBL

My fried suggest I try to apply for a job at EMBL. I'm not bioinformatic in any way, but my friend (who by the way is a biologist working at EMBL) insists that I could adapt to the new environment as long as I have a interest in subject and am generally good at learning new things. But here is a catch. For the last 4 years I've been wor...

What do you think is the best language for Bioinformatics?

I have done a couple research jobs in Bio-informatics and I have used Matlab for them. Matlab had a lot of powerful tools and was easy to use. I did thinks with genome sequencing and predicting metabolic pathways. I am wondering what other people think is best? or there might not be one specific language but a few that lend themselves be...

python solutions for managing scientific data dependency graph by specification values

I have a scientific data management problem which seems general, but I can't find an existing solution or even a description of it, which I have long puzzled over. I am about to embark on a major rewrite (python) but I thought I'd cast about one last time for existing solutions, so I can scrap my own and get back to the biology, or at l...

Best way to read a FASTA file in c#

Hi there. I have a FASTA file containing several protein sequences. The format is like ---------------------- >protein1 MYRALRLLARSRPLVRAPAAALASAPGLGGAAVPSFWPPNAAR MASQNSFRIEYDTFGELKVPNDKYYGAQTVRSTMNFKIGGVTE RMPTPVIKAFGILKRAAAEVNQDYGLDPKIANAIMKAADEVAE GKLNDHFPLVVWQTGSGTQTNMNVNEVISNRAIEMLGGELGSK IPVHPNDHVNKSQ >protein2 MRSRPAGPALLLLLLF...

Image Processing video lectures or any other learning resources?

Hi all, I am new to Image Processing. I will use Image Processing to for Medical Images. I am searching for video lectures or any other good learning resources? Any help. Thanks in advance. Regards, Saghar Ayyaz ...

Bookmarklet to open user defined link and make user defined form drop-down box selections

I've written a bookmarlet to open a user defined web link, in this specific case a specific genomic location in the UCSC genome broswer. javascript:d=%22%22+(window.getSelection?window.getSelection():document.getSelection?document.getSelection():document.selection.createRange().text);d=d.replace(/%5Cr%5Cn%7C%5Cr%7C%5Cn/g,%22%20,%22);if(...

Fetching genomic sequence efficiently in Python?

How can I fetch genomic sequence efficiently using Python? For example, from a .fa file or some other easily obtained format? I basically want an interface fetch_seq(chrom, strand, start, end) which will return the sequence [start, end] on the given chromosome on the specified strand. Analogously, is there a programmatic python interf...

Installing GSL on windows and making it available to python package P4 for phylogenetics

I want to use the P4 Python Package on a windows machine. from: [http://www.bmnh.org/~pf/p4.html][1] I have python 2.6 installed and working with numpy ready and realines.py installed. There is a win32-gbu version of GSL installed on my windows machine, from gnuwin32.sourceforge.net/packages/gsl.htm When I try to install P4, using set...

"average length of the sequences in a fasta file": Can you improve this Erlang code ?

I'm trying to get the mean length of fasta sequences using Erlang. A fasta file looks like this >title1 ATGACTAGCTAGCAGCGATCGACCGTCGTACGC ATCGATCGCATCGATGCTACGATCGATCATATA ATGACTAGCTAGCAGCGATCGACCGTCGTACGC ATCGATCGCATCGATGCTACGATCTCGTACGC >title2 ATCGATCGCATCGATGCTACGATCTCGTACGC ATGACTAGCTAGCAGCGATCGACCGTCGTACGC ATCGATCGCATCGATGCTACGATC...

Improving clojure lazy-seq usage for iterative text parsing

I'm writing a Clojure implementation of this coding challenge, attempting to find the average length of sequence records in Fasta format: >1 GATCGA GTC >2 GCA >3 AAAAA For more background see this related StackOverflow post about an Erlang solution. My beginner Clojure attempt uses lazy-seq to attempt to read in the file one record a...