views:

271

answers:

7

I'm working on a small application and thinking about integrating BLAST or other local alignment searches into my application. My searching has only brought up programs, which need to be installed and called as an external program.

Is there a way short of me implementing it from scratch? Any pre-made library perhaps?

A: 

Computational Molecular Biology: An Introduction has code for Smith-Waterman and other dynamic programming alignment algorithms.

Alex Reynolds
+1  A: 

The BLAST algorithm was implemented ~20 years ago, it is now a very big algorithm and I cannot imagine it can be easily implemented from scratch. You can try to learn about it when looking at the sources of the 'blastall' program in the NCBI toolkit. A simpler pairwise algorithm (Swith Waterman, Needleman-Wunsch )should be easier to implement:

Pierre
A: 

I use NetBLAST through the blastcl3 client binary. I believe that the blastcl3 binary is a pretty thin client for the NetBLAST web service.

If so, it shouldn't be too hard to sniff the packets and implement your own client. Depending on your use case, this might be faster/easier than implementing your own alignment algorithm. It does, however, introduce a dependency to NCBI's web services.

http://www.ncbi.nlm.nih.gov/staff/tao/URLAPI/netblast.html

+2  A: 

Does it have to be in C, or would C++ also be OK? If so, you might want to look at the SeqAn library here.

PhiS
That's great. I'll take a look at it, if I can implement it in C++ too.
brandstaetter
+3  A: 

This is a topic which has also to do with reproducibility of results: it is always better to use the raw blast binary provided by NCBI or UCSC, because it will make your results easeir to reproduce by other scientists and will save you a lot of time spent on writing tests (more time than you can imagine).

For the day-to-day work I have often used exonerate, a tool written in C which can do both global and local alignment, has a simple unix-like interface, and doesn't require to format your input as with blast.

Moreover, take in mind that people usually use a combination of makefiles and scripts to define a pipeline, instead of calling everything from a script: most programming languages are not good to define pipelines, while automated build tools like Make are not useful for scripting tasks. Have a look at these examples: http://skam.sourceforge.net/skam-intro.html http://swc.scipy.org/lec/build.html

dalloliogm
+1  A: 

I just stumbled across the thing I would have wanted: The NCBI C++ Toolkit. Thanks for all the suggestions though.

brandstaetter
A: 

I posted a similar question (http://stackoverflow.com/questions/2248016/running-blast-bl2seq-without-creating-sequence-files)

Basically, the answer I came up with was running this command:

bl2seq -i<(echo sequence1) -j(echo sequence2) -p blastn

That pipes the result of the echo command to the bl2seq (blast 2 sequences) program.

But I couldn't get it to work via calling system from Python

Austin