bioinformatics

Learning JBoss drools: what should be my model.

Hi all, I'm learning JBoss Drools and I'm playing with the genetics data from the hapmap project: ( http://hapmap.ncbi.nlm.nih.gov/genotypes/latest/forward/non-redundant/ ) . Each file in this directory is a table with the individuals at the top, the positions on the genome on the left , and the observed mutations for each individual/po...

Bioinformatics and computer science

How hard is it for a computer science MS student (with a little knowledge in biology) to research problems related to bioinformatics? How much is bioinformatics related to graph algorithms and data structures? ...

Fast Sequence Alignment on Unicode Strings

I want to run something like the BLAST algorithm to query a large database of unicode strings. Most of the alignment software like BLAST expects nucleotide or protein strings as input. But my input could potentially contain any unicode character. Is anyone aware of a piece of software that will let me do this? The scoring matrix coul...

what is the best bioinformatics book for a computer scientist?

i am a cs graduate student interested in bioinformatics research. i don't have a good experience with biology , so what is the best bioinformatics book for a computer scientist? ...

Need help with peak signal detection in Perl

Hi everyone I have some values of intensities from images of yeast colony plates. I need to be able to find the peak values from the intensity values. Below is an example image showing how the values look when graphed. Example of some of the values 5.7 5.3 8.2 16.5 34.2 58.8 **75.4** 75 65.9 62.6 58.6 66.4 71.4 53.5 40.5 26.8 14....

How do you use Ruby on Rails for science (if applicable)?

We do research in systems biology. We prefer to use existing data sets, because collecting new biological data is expensive. Thus, a lot of the scripts we write are little more than transformations of one data set into another. Eventually, we put our results online -- and more and more journals are requiring this sort of thing. So it w...

How to visulaize gene networks and cluster groups of genes?

I'm working with biological data - namely groups of genes. For example: group 1: geneA geneB geneC group 2: geneD geneE group 3: geneF geneG geneH For each pair of genes, geneX and geneY I have a score telling how similiar the two genes are (actually, I have two scores, since I used BLAST which is 'directional': I first searched geneX...

Obtaining blastn databases programatically

In the Nucleotide BLAST search page is there a way to obtain programatically the databases listed in the "Choose Search Set" box? Maybe in XML format? (it doesn't matter the programming language used) Thanks in advance ...

Using awk create two arrays from two column values, find difference and sum differences, and output data

I have a file with the following fields (and an example value to the right): hg18.ensGene.bin 0 hg18.ensGene.name ENST00000371026 hg18.ensGene.chrom chr1 hg18.ensGene.strand - hg18.ensGene.txStart 67051161 hg18.ensGene.txEnd 67163158 hg18.ensGene.exonStarts 67051161,67060631,67065090,67066082,67071855,67072261,67073896,67075980,67078739...

combine two lists with a join on a column

I'm trying to combine two lists, joining them by a common field suchs as ENST00000371026. I've tried the following but no luck. What is the actual way to do it? cat> gar1.txt <<EOF ENST00000371026 ENSG00000152763 ENST00000371023 ENSG00000152763 ENST00000395250 ENSG00000152763 ENST00000309502 ENSG00000163485 ENST00000377464 ENSG00000142...

MongoDB: What’s the most efficient way to store a chromosome/position

I want to store some genomic positions (chromosome,position) using MongoDB. something like: { chrom:"chr2", position:100, name:"rs25" } I want to be able to quickly find all the records in a given segment (chrom , [posStart - posEnd]). What would be the best key/_id to be used ? a chrom , position object ? db.snps.save({_id:{chrom...

How can I searching for different variants of bioinformatics motifs in string, using Perl?

I have a program output with one tandem repeat in different variants. Is it possible to search (in a string) for the motif and to tell the program to find all variants with maximum "3" mismatches/insertions/deletions? ...

Best way to organize Bioinformatics projects?

I come from a c.s. background but am now doing genomics. My projects include a lot of Bioinformatics typically involving: aligning sequences, comparing overlap etc between sequences and various genome-annotation-features, from different classes of biological samples, time-course data, microarray, high-throughput sequencing ("next-gen" s...

Why can't python find some modules when I'm running CGI scripts from the web?

I have no idea what could be the problem here: I have some modules from Biopython which I can import easily when using the interactive prompt or executing python scripts via the command-line. The problem is, when I try and import the same biopython modules in a web-executable cgi script, I get a "Import Error" : No module named B...

best programming language for statistical analysis in bioinformatics

Possible Duplicate: What do you think is the best language for Bioinformatics? Hello Please could you advise me what is the best language to use to perform statistical analysis on bioinformatics data (e.g. distribution of SNPs in a genome)? I was going to use biojava because I come from a programming background and I like ob...

Subclass/Child class

Hi, I had this class and subclass : class Range: def __init__(self, start, end): self.setStart(start) self.setEnd(end) def getStart(self): return self.start def setStart(self, s): self.start = s def getEnd(self): return self.end def setEnd(self, e): self.end = e def getLength(self): return len(range(sel...

How to use other clustering methods for clustergram in Matlab's bioinformatics toolbox.

EDIT: I figured it out. Just did not understand notation. Hello, Hopefully someone out there is familiar with the clustergram in the bioinformatics toolbox. I am interested in the graphical aspects of the function (the dendrogram/heat map), but am currently handicapped as it requires me to use Matlab's cluster() function. I would prefe...