views:

140

answers:

1

(This is a database / R commands question)

I wish (for my thesis work), to import tRNA data into R and have it aligned.

My questions are:
1) What resources can I use for the data.
2) What commands might help me with the import/alignment.

So far, I found two nice repositories that holds such data:

And also the readFASTA command of the Biostrings R Package, that does basic importing of the data into R.

My problem still remains with how to handle the alignment of the tRNA.

Since I am not from the field, I might be missing a very basic answer (like where I should download the data from, or what command to use). If you might be willing to advice me, that would be most helpful.

Many thanks in advance, Tal

+1  A: 

The two databases you have listed look like a good place to start. Here's another: tRNADB-CE.

Obtaining a curated dataset can save you a lot of headaches. Have you looked for any good review papers on tRNA genes that might point to "gold standard" tRNA databases currently used in the field?

Another way to go about building a tRNA sequence database would be to use sequences tagged with Geno Ontology (GO) terms related to tRNA function. You can search for GO terms such as "trna" using AmiGO and then retrieve all sequences tagged with the specific GO terms that you care about. I would recommend starting with a curated database, however.

Given that your sequence data is in FASTA format (which it probably will be), three common utilities for multiple sequence alignment are: clustalW, MUSCLE, and T-Coffee.

Since you are working in R, here is an R package that will allow you to make calls to MUSCLE (you will need to install the stand-alone MUSCLE utility as well). Parsing the output from the alignment programs is not difficult, but this package may save you a little effort.

Good luck.

awesomo
Hi awesomo, thank you so much for your reply.After further reading, I see now that this database:http://gtrnadb.ucsc.eduDoes Structural alignments by aligning tRNA sequences against domain-specific tRNA covariance models with the use of COVE.Can MUSCLE do the same? (Thanks again!)Tal
Tal Galili
I took a look at the cove paper and their "covariance models" are basically hidden markov models with additional metadata that takes into account RNA secondary structure. One thing you could do with the RNA sequence data is use MUSCLE to construct a Multiple Sequence Alignment (MSA) of the sequences you're interested in; then use a tool like HMMER (http://hmmer.janelia.org) to build your own custom HMM; finally use your HMM to scan against genomes in search of similar sequences. I recommend you get an understanding of the cove paper first however to understand the pitfalls.
awesomo
Awesomo - very informative reply, thank you!A question: after getting a few more replies from people, I am gathering that the alignment is (as you wrote) is based on first folding the tRNA's to their secondary structure. Is there any way of doing that in R?p.s: since I am new to this, I see how it takes me time to clarify my questions - thanks for your (and others) patience!
Tal Galili
I'm not aware of a R library for predicting RNA secondary structure (I haven't looked, so it may very well exist).
awesomo