views:

63

answers:

3

Earlier on i post a question about visualization and clustering. I guess my question was not quite clear enough so I post it again. I hope i make a better explanation this time . I also apologize for not "accept answer" for my old questions. I didn't know i can do that until a guy point it out. I will definitely do it from now on.

Okay. Back to the question. Previously i have written a python script to calculate the similarity between document. Now i have all the data write to notepad and it looks like this:

(1, 6821): inf

(1, 8): 3.458911570

(1, 9): 7.448105193

(1, 10): inf

(1, 11): inf

(6821, 8): inf

(6821, 9): inf

(6821, 10): inf

(6821, 11): inf

(8, 9): 2.153308936

(8, 10): inf

(8, 11): 16.227647992

(9, 10): inf

(9, 11): 34.943139430

(10, 11): inf

The number in the parenthesis represents document numbers. And the value after it, is the distance between the two documents. What i want is actually visualization tools or method which i can create nodes that represent each documents number. For example here, i have 6 different documents. So i wish to create 6 different nodes that represent my document numbers. Then, i want to have edges that connect these nodes together based on their distances. For example the distance between document 1 and 8 is 3.46 while the distance between document 1 and 9 is 7.45. So, 1 & 8 need to cluster closer than 1 & 9. While the document pairs with 'inf' distance shouldn't have any connection or edge connecting them together.

This sounds easy but i have really hard time finding an open source visualization tool that can effective help me to perform this. I appreciate any suggestion recommendation.

+1  A: 

http://www.graphviz.org/

In particular, the neato package:

$ cat similar.dot
graph g {
   n1 -- n8 [ weight = 3.458911570 ];
   n1 -- n9 [ weight = 7.448105193 ];
   n8 -- n9 [ weight = 2.153308936 ];
   n8 -- n11 [ weight = 16.227647992 ];
   n9 -- n11 [ weight = 34.943139430 ];
   n10;
   n6821;
}
$ neato -Tpng similar.dot -o similar.png

msw
Thanks! This is almost exactly what i want the layout to be. I already tried it out and it works perfect.
Jacky
+2  A: 

Have you tried GraphViz? I use it for situations like this. I haven't tried altering the length of the node connections, you'll have to tease that one out. Check out the list of example graphs as a starting point.

qor72
A: 

Processing is a really lovely tool for data visualization (and also language, based on Java). Think of it as writing simplified OpenGL (you can even use OpenGL with it if you want it) in Java plus the freedom to use all the Java libraries. You can even embed your Processing app inside another Swing or AWT application.

Here's the main page, and the brand new wiki.

U said you used Pyton. There's a hack so you can use Jython instead of Java in this blog post. I haven't tried it but maybe it works fine. The only lack in using another language (there's also a JavaScript 'port', Processing.js) is that all the examples are for the Processing language (based on Java).

moondowner