views:

180

answers:

3

I have an application in which users interact with each-other. I want to visualize these interactions so that I can determine whether clusters of users exist (within which interactions are more frequent).

I've assigned a 2D point to each user (where each coordinate is between 0 and 1). My idea is that two users' points move closer together when they interact, an "attractive force", and I just repeatedly go through my interaction logs over and over again.

Of course, I need a "repulsive force" that will push users apart too, otherwise they will all just collapse into a single point.

First I tried monitoring the lowest and highest of each of the XY coordinates, and normalizing their positions, but this didn't work, a few users with a small number of interactions stayed at the edges, and the rest all collapsed into the middle.

Does anyone know what equations I should use to move the points, both for the "attractive" force between users when they interact, and a "repulsive" force to stop them all collapsing into a single point?

Edit: In response to a question, I should point out that I'm dealing with about 1 million users, and about 10 million interactions between users. If anyone can recommend a tool that could do this for me, I'm all ears :-)

+1  A: 

I can recommend some possibilities: first, try log-scaling the interactions or running them through a sigmoidal function to squash the range. This will give you a smoother visual distribution of spacing.

Independent of this scaling issue: look at some of the rendering strategies in graphviz, particularly the programs "neato" and "fdp". From the man page:

  neato  draws  undirected graphs using ``spring'' models (see Kamada and
  Kawai, Information Processing Letters 31:1, April 1989).   Input files
  must  be  formatted  in the dot attributed graph language.  By default,
  the output  of  neato  is  the  input  graph  with  layout coordinates
  appended.

  fdp  draws  undirected  graphs using a ``spring'' model. It relies on a
  force-directed approach in the spirit of Fruchterman and Reingold  (cf.
  Software-Practice & Experience 21(11), 1991, pp. 1129-1164).

Finally, consider one of the scaling strategies, an attractive force, and some sort of drag coefficient instead of a repulsive force. Actually moving things closer and then possibly farther later on may just get you cyclic behavior.

Consider a model in which everything will collapse eventually, but slowly. Then just run until some condition is met (a node crosses the center of the layout region or some such).

Drag or momentum can just be encoded as a basic resistance to motion and amount to throttling the movements; it can be applied differentially (things can move slower based on how far they've gone, where they are in space, how many other nodes are close, etc.).

Hope this helps.

Thomas Kammeyer
+2  A: 

In the past, when I've tried this kind of thing, I've used a spring model to pull linked nodes together, something like: dx = -k*(x-l). dx is the change in the position, x is the current position, l is the desired separation, and k is the spring coefficient that you tweak until you get a nice balance between spring strength and stability, it'll be less than 0.1. Having l > 0 ensures that everything doesn't end up in the middle.

In addition to that, a general "repulsive" force between all nodes will spread them out, something like: dx = k / x^2. This will be larger the closer two nodes are, tweak k to get a reasonable effect.

Tom
A: 

The spring model is the traditional way to do this: make an attractive force between each node based on the interaction, and a repulsive force between all nodes based on the inverse square of their distance. Then solve, minimizing the energy. You may need some fairly high powered programming to get an efficient solution to this if you have more than a few nodes. Make sure the start positions are random, and run the program several times: a case like this almost always has several local energy minima in it, and you want to make sure you've got a good one.

Also, unless you have only a few nodes, I would do this in 3D. An extra dimension of freedom allows for better solutions, and you should be able to visualize clusters in 3D as well if not better than 2D.

DJClayworth