views:

84

answers:

5

Hi, i am currently writing a python script to find the similarity between documents.I have already calculated the similarities score for each document pairs and store them in dictionaries. It looks something like this:

{(8328, 8327): 1.0, (8313, 8306): 0.12405229825691289, (8329, 8328): 1.0, (8322, 8321): 0.99999999999999989, (8328, 8329): 1.0, (8306, 8316): 0.12405229825691289, (8320, 8319): 0.67999999999999989, (8337, 8336): 1.0000000000000002, (8319, 8320): 0.67999999999999989, (8313, 8316): 0.99999999999999989, (8321, 8322): 0.99999999999999989, (8330, 8328): 1.0}

My final goal is to cluster the similar documents together. The data above can be viewed in another way. Let's say the document pair (8313,8306). The similarity score is 0.12405. I can specified that the inverse of the score will be the distance between document 8313 and 8306. Therefore, similar documents will cluster closer together while not-so-similar documents will be further apart based on their distance.

My question is, IS there any open source visualization tool that can help me to achieve this?

A: 

I think Weka can do this. You might have to massage the input file to a different format first. Weka also has an API, though it's in Java, not Python.

FrustratedWithFormsDesigner
+1  A: 

I'm not sure what the term for that type of graph would be (minimum weight spanning tree?), but check out Graphviz. There are some Python bindings for it as well, but failing that you could simply generate an input file for it, or pipe data directly in.

Nick T
A: 

There are lots of tools you can use to do this.

There have been other mentions, but you could fairly easily do something like this in Tkinter, PyGTK+, PyQT, matplotlib, or really any graphical lib.

However, a polar plot in matplotlib would be fairly simple:

(untested):

import math
from matplotlib.pyplot import figure, show

# assign your data here
fig = figure()
ax = fig.add_subplot(111, polar=True)

for pair in data:
    ax.plot(0, data[pair], 'o')
show()

That should give you a rudimentary visualization. You could also change it around to

ax.plot(pair*math.pi, 1, 'o')

For a different style of visualization.

The matplotlib docs are very good and they have plenty of examples.

Wayne Werner
A: 

Maybe Networkx may help. This example could be a good starting point:

http://networkx.lanl.gov/examples/drawing/knuth_miles.html

Alejandro
A: 

I think you have to use MDS

http://en.wikipedia.org/wiki/Multidimensional_scaling

Dirk Nachbar
Is that open source?
Jay Askren