I know this is a very specific question relating to BLAST and Bioinformatics but here goes:
I am attempting to use standalone BLAST (I already have downloaded it and tested it running on the command line) to perform a DNA sequence alignment (blastn). I need to be able to provide both my own query file (fasta format) and my own database...
I am running standalone command line blast to align many query sequences against a large database sequence of nucleotides. I can modify the command line parameters of the blastn program to change various parameters such as the match/mismatch scores.
I am wondering - for the 'bit score' that blastn outputs, does it make sense to compare...
Hi all,
I was curious to know if there is any bioinformatics tool out there able to process a multiFASTA file giving me infos like number of sequences, length, nucleotide/aminoacid content, etc. and maybe automatically draw descriptive plots.
Also an R BIoconductor solution or a BioPerl module would do, but I didn't manage to find anyth...
Hi all,
I'm trying to normalize a big amount of Affymetrix CEL files using R. However, some of them appear to be truncated, so when reading them i get the error
Cel file xxx does not seem to have the correct dimensions
And the normalization stops. Manually removing the corrupted files and restart every time will take very long. Do yo...
I am hunting for a job and one of the companies that I interviewed with asked me to write a little test program so that they could test my programming abilities. I am a biologist by training, and most of my programming knowledge I gain by autodidactic means. I am also more comfortable writing Python then Java.
This is the brief I was g...
I run across a lot of "embarrassingly parallel" projects I'd like to parallelize with the multiprocessing module. However, they often involve reading in huge files (greater than 2gb), processing them line by line, running basic calculations, and then writing results. What's the best way to split a file and process it using Python's multi...
Does a regular expression exist for (theoretical) tryptic cleavage of protein sequences? The cleavage rule for trypsin is: after R or K, but not before P.
Example:
Cleavage of the sequence VGTKCCTKPESERMPCTEDYLSLILNR should result in these 3 sequences (peptides):
VGTK
CCTKPESER
MPCTEDYLSLILNR
Note that there is no cleavage after ...
I'd like to create an rRNA sequence database with a web front end for the lab I work in. It seems common in biology to want to search a large number of sequences using alignment algorithms such as BLAST and HMMER, so I wondered if there is any existing php/python/rails projects that allow easy creation of a generic sequence database with...
Hello all,
Im trying to extract only the first hit from an NCBI xml BLAST file. next I would like to get only the first HSP. at the final stage I would like to get these based on best score.
to make things clear here a sample of the xml file:
<?xml version="1.0"?>
<!DOCTYPE BlastOutput PUBLIC "-//NCBI//NCBI BlastOutput/EN" "http://www.n...
Hi
I'm doing an iteration through 3 words, each about 5 million characters long, and I want to find sequences of 20 characters that identifies each word. That is, I want to find all sequences of length 20 in one word that is unique for that word. My problem is that the code I've written takes an extremely long time to run. I've never ev...
Hi there guys,
When is about programming, we certain have some blogs to follow, but when your thinking to try a different field, how can you find the big names?
I wish to try bioinfrmatics field and to add into my daily schedule some blog reads from this domain. Can you recommend me some blogs?
...
Hi,
I was looking in the wiki how to convert the following information about beads, cartesian coordinates + energy :
23.4 54.6 12.3 -123.5 54.5 23.1 9.45 -56.7 .......
to a draw in pymol that contains for each atom a sphere of radius R, centered on its coordinates, and with color, in a rainbow gradient.
Thanks
...
how gene ranking is done for microarray data using information gain and chi-square statistics ?? Please illustrate with a simple example..
...
Dear R user,
I am searching for good R package to allign multiple spectra.
Thanks.
...
I have a script that performs BLAST queries (bl2seq)
The script works like this:
Get sequence a, sequence b
write sequence a to filea
write sequence b to fileb
run command 'bl2seq -i filea -j fileb -n blastn'
get output from STDOUT, parse
repeat 20 million times
The program bl2seq does not support piping.
Is there ...
I have tried google with no luck. I have seen some weak references to robust multi-array averaging done with python but no code. I am not so interested in reinventing the wheel. Any suggestions on a python module, script ....
If I could find a nice explanation or example of the algorithm I would write a python implementation to share.
...
Does anyone have any experience running BLAST with XGrid?
Googling reveals a tool called 'Xgrid BLAST' existed but not where to get.
...
So I'm going to an average university, majoring in CS.
I haven't learned a damn thing and am in my third year.
I've come to be really bored with studying CS.
Initially, I was kind of misinformed and thought majoring in CS
would make me a good "product creator".
I make my money combining programming and business/marketing.
But I have a...
Hi,
I am trying to implement protein pairwise sequence alignment using "Global Alignment" algorithm by 'Needleman -Wunsch'.
I am not clear about how to include 'Blosum62 Matrix' in my source code to do the scoring or to fill the two-dimensional matrix?
I have googled and found that most people suggested to use flat file which contain...
I recently saw someone with a T-shirt with some Perl code on the back. I took a photograph of it and cropped out the code:
Next I tried to extract the code from the image via OCR, so I installed Tesseract OCR and the Python bindings for it, pytesser.
Pytesser only works on TIFF images, so I converted the image in Gimp and entered the...