views:

2569

answers:

13

Hi Everyone, I was wondering if anyone has any advice for rendering an undirected graph with 178,000 nodes and 500,000 edges. I've tried Neato, Tulip, and Cytoscape. Neato doesn't even come remotely close, and Tulip and Cytoscape claim they can handle it but don't seem to be able to. (Tulip does nothing and Cytoscape claims to be working, and then just stops.)

Does anyone have any ideas? I'd just like a vector format file (ps or pdf) with a remotely reasonable layout of the nodes.

Thanks!

A: 

A windows tool that can visualize graphs is pajek, it generates a eps output, however I don't know if it can read your data.

Jasper
+2  A: 

You might offer a sanitized version of the file to the developers of those tools as a debugging scenario, if all else fails.

C. Lawrence Wenham
+4  A: 

Mathematica could very likely handle it, but I have to admit my first reaction was along the lines of the comment that said "take a piece of paper and color it black." Is there no way to reduce the density of the graph?

A possible issue is that you seem to be looking for layout, not just rendering. I have no knowledge about the Big O characteristics of the layouts implemented by various tools, but intuitively I would guess that it might take a long time to lay out that much data.

Larry OBrien
A: 

There's a list of apps here: http://www.mkbergman.com/?p=414

Walrus and LGL are two tools supposedly suited for large graphs. However, both seem to require graphs to be input as text files in their own special format, which might be a pain.

A: 

Is there any 'natural' 2d ordering for these nodes (e.g. location) that you could just plot them without drawing the graph?

OJW
+2  A: 

You could try aiSee: http://www.aisee.com/manual/unix/56.htm

A: 

I don't think you can come remotely close to visualising that in a flat layout.

I've been intrigued by Hyperbolic Graphs, described in this research paper for some time. Try the software from SourceForge.

Another idea is just graphing the nodes using a TreeMap as seen at Panopticode.

Andy Dent
+1  A: 

Check out the Java/Jython based GUESS: http://graphexploration.cond.org/

thestoneage
A: 

Does it need to be truly accurate?

Depending on what you're trying to accomplish it might be good enough to just graph 10% or 1% of the data volume. (of course, it might also be completely useless, but it all depends on what the visualization is for)

jplindstrom
+7  A: 

I would suggest you do first some preprocessing of the data, for example collapsing nodes to clusters and then visualize the clusters. This will reduce the number of nodes and makes it easier for algorithms such as Kamada-Kawai or Fruchterman-Reingold to render the resulting graph.

If you really need to visualize 500.000 nodes then can you consider using a simple circular layout. This will be easy to render without the issues that force-based algorithms have. Take a look at Circos: http://mkweb.bcgsc.ca/circos/

Circos is graph visualization developed by bio-informatics people which is tailored to visualize genomes and other extremely large and complex datasets.

It's a PERL based package, so I hope that doesn't raise any problems.

DrDee
+1  A: 

I expect edge clustering (http://www.visualcomplexity.com/vc/project_details.cfm?id=679&index=679&domain=) would help. This technique bundles related edges together, reducing the visual complexity of the graph. You may have to implement the algorithm yourself though.

Ollie G
A: 

Large Graph Layout (LGL) project helped me a lot with a similar ptoblem. It handles layout and have a small java app to draw produced layouts in 2D. No vector output out of the box so you'll have to draw the graph yourself (given the node coordinates produced by LGL)

Nikita Nemkin
A: 

You can also try NAViGaTOR (disclosure: I'm one of the developers for that software). We've successfully visualized graphs with as many as 1.7 million edges with it. Although such large networks are hard to manipulate (the user interface will get laggy). However, it does use OpenGL for the visualization so some of the overhead is transferred to the graphics card.

Also note that you'll have to crank up the memory settings in the File->Preferences dialog box before you can successfully open a network that big.

Finally, as most of the other responses point out, you are better off re-organizing your data into something smaller and more meaningful.

Alinium