ansaurus

Question

running BLAST (bl2seq) without creating sequence files

Answer 1

+1 A:

Is this the bl2seq program from BioPerl? If so, it doesn't look like you can do piping to it. You can, however, code your own hack using Bio::Tools::Run::AnalysisFactory::Pise, which is the recommended way of going about it. You'd have to do it in Perl, though.

If this is a different bl2seq, then disregard the message. In any case, you should probably provide some more detail.

Pedro Silva 2010-02-11 22:10:30

It's the bl2seq program that comes with NCBI's Blast+. The BioPerl bl2seq is a wrapper of some sort.

Austin 2010-02-12 03:13:29

Answer 2

+3 A:

Depending on what OS you're running on, you may be able to use something like bash's process substitution. I'm not sure how you'd set that up in Python, but you're basically using a named pipe (or named file descriptor). That won't work if bl2seq tries to seek within the files, but it should work if it just reads them sequentially.

cjm 2010-02-11 22:45:54

Hey, that worked! You're rad bl2seq -i<(echo sequence1) -j(echo sequence2) -p blastn

Austin 2010-02-12 03:26:17

Answer 3

+1 A:

How do you know bl2seq does not support piping.? By the way, pipes is an OS feature, not the program. If your bl2seq program outputs something, whether to STDOUT or to a file, you should be able to parse the output. Check the help file of bl2seq for options to output to file as well, eg -o option. Then you can parse the file.

Also, since you are using Python, an alternative you can use is BioPython module.

ghostdog74 2010-02-12 01:01:02

It's certainly up to the program whether it reads anything from standard input. Just because you can pipe something to STDIN doesn't mean the program will do anything with it.

cjm 2010-02-12 01:47:26

who says anything about STDIN. I am asking OP about STDOUT.

ghostdog74 2010-02-12 01:51:42

I can parse the output just fine as it's going to STDOUT.@cjn - How would I pipe to two arguments?bl2seq -i <pipe1> -j <pipe2>

Austin 2010-02-12 03:15:30

@ghostdog75 you deserve credit for answering this. I didn't really understand what you meant at the time.

Austin 2010-05-06 20:00:18

Answer 4

A:

Wow. I have it figured out.

The answer is to use python's subprocess module and pipes!

EDIT: forgot to mention that I'm using blast2 which does support piping.

(this is part of a class)

def _query(self):
    from subprocess import Popen, PIPE, STDOUT
    pipe = Popen([BLAST,
    '-p', 'blastn',
    '-d', self.database,
    '-m', '8'],
    stdin=PIPE,
    stdout=PIPE)
    pipe.stdin.write('%s\n' % self.sequence)
    print pipe.communicate()[0]

where self.database is a string containing the database filename, ie 'nt.fa' self.sequence is a string containing the query sequence

This prints the output to the screen but you can easily just parse it. No slow disk I/O. No slow XML parsing. I'm going to write a module for this and put it on github.

Also, I haven't gotten this far yet but I think you can do multiple queries so that the blast database does not need to be read and loaded into RAM for each query.

Austin 2010-05-06 19:58:35

Answer 5

+1 A:

I call blast2 using R script:
....
system("mkfifo seq1")
system("mkfifo seq2")
system("echo sequence1 > seq1"), wait=FALSE)
system("echo sequence2 > seq2"), wait=FALSE)
system("blast2 -p blastp -i seq1 -j seq2 -m 8", intern = TRUE)
....
This is 2 times slower(!) vs. writing and reading from hard drive!

standa 2010-07-01 18:09:25

Ha! I appreciate your anti-answer.

Austin 2010-07-02 01:55:42

ansaurus

tags:

views:

answers:

running BLAST (bl2seq) without creating sequence files

related questions