views:

36

answers:

2

I'm working with biological data - namely groups of genes. For example:

group 1: geneA geneB geneC
group 2: geneD geneE
group 3: geneF geneG geneH

For each pair of genes, geneX and geneY I have a score telling how similiar the two genes are (actually, I have two scores, since I used BLAST which is 'directional': I first searched geneX against all the other genes then geneY against all the other genes, so I have two geneX--geneY scores, but I guess I can take the lower score of the two, or the average).

So, let's suppose I have only one score for each pair of genes. My data can be viewed as a undirected graph: alt text

and recall each edge has a score attached to it.

Now, what I would like to do is:

  1. Visualize my data interactively: being able to click on gene nodes and open a link attached to them, show only edges above/below some threshold, control how the network is "spread", etc.

  2. Cluster together groups which are similar, i.e. groups that have similar genes.

Any ideas of how can I do that? I guess it's basic clustering and I would appreciate any hints on packages/software that can be of any help here.

Thank you.

A: 

You can try cluto. You will have to transform your triples (gene_1, gene_2, similarity) into a matrix and use 'scluster'.

czuk
A: 

You'll probably get better responses if you ask this over at BioStar, the bioinformatics stackexchange. Specifically, many of the answers in this thread might be relevant:

Which is the best software to represent biological pathways in a directed graph (network) ?

chrisamiller