biopython

BioPython: Skipping over bad GIDs with Entrez.esummary/Entrez.read

Sorry about the odd title. I am using eSearch & eSummary to go from Accession Number --> gID --> TaxID Assume that 'accessions' is a list of 20 accession numbers (I do 20 at a time because that's the maximum that NCBI will allow). I do: handle = Entrez.esearch(db="nucleotide", rettype="xml", term=accessions) record = Entrez.read(ha...

Can you BLAST with BioPython? Anyone have experience with this?

Hi, I am going to BLAST several sequences and download the top 100 hits or so from each sequence. Then I will pool the downloaded sequences and remove duplicates. I was thinking of trying out BioPython for this since I am learning Python, but I don't know if this is feasible? Comments to this anyone? Thanks! Jon ...

BioPython: extracting sequence IDs from a Blast output file

Hi, I have a BLAST output file in XML format. It is 22 query sequences with 50 hits reported from each sequence. And I want to extract all the 50x22 hits. This is the code I currently have, but it only extracts the 50 hits from the first query. from Bio.Blast import NCBIXM blast_records = NCBIXML.parse(result_handle) blast_record = bl...

Subprocess fails to catch the standard output

I am trying to generate tree with fasta file input and Alignment with MuscleCommandline import sys,os, subprocess from Bio import AlignIO from Bio.Align.Applications import MuscleCommandline cline = MuscleCommandline(input="c:\Python26\opuntia.fasta") child= subprocess.Popen(str(cline), stdout = subprocess.PIPE,...

50 sequences in one line

I have Multiple sequence alignment (clustal) file and I want to read this file and arrange sequences in such a way that it looks more clear and precise in order. I am doing this from biopython using AlignIO object. My codes goes like this: alignment = AlignIO.read("opuntia.aln", "clustal") print "Number of rows: %i" % len(align) for...

Fetching genomic sequence efficiently in Python?

How can I fetch genomic sequence efficiently using Python? For example, from a .fa file or some other easily obtained format? I basically want an interface fetch_seq(chrom, strand, start, end) which will return the sequence [start, end] on the given chromosome on the specified strand. Analogously, is there a programmatic python interf...

How do I parse data in a table using Biopython?

Hello, I want to screen a particular column in a table using biopython. I want to parse the table and retain only entries not having "empty spaces" in a particular column. Please any ideas? ...

Why can't python find some modules when I'm running CGI scripts from the web?

I have no idea what could be the problem here: I have some modules from Biopython which I can import easily when using the interactive prompt or executing python scripts via the command-line. The problem is, when I try and import the same biopython modules in a web-executable cgi script, I get a "Import Error" : No module named B...

Laplacian smoothing to Biopython

Hi, I am trying to add Laplacian smoothing support to Biopython's Naive Bayes code 1 for my Bioinformatics project. I have read many documents about Naive Bayes algorithm and Laplacian smoothing and I think I got the basic idea but I just can't integrate this with that code (actually I cannot see which part I will add 1 -laplacian num...

Phylo BioPython building trees

Hello! I trying to build a tree with BioPython, Phylo module. What I've done so far is this image: each name has a four digit number followed by - and a number: this number refer to the number of times that sequence is represented. That means 1578 - 22, that node should represent 22sequences. the file with the sequences aligned: file...