I build a huge graph in JVM (Scala) which I want to use repeatedly, tweaking algorithms. I'd rather not reload it each time from disk. Is there a way to have it sit in one JVM while connecting from another, where the algorithms are being developed?
Using RMI perhaps? Have one instance working as server and the rest as clients?
I think it would be much more complicated than reloading from disk.
You can certainly create an interface onto it and expose it via (say) RMI.
My initial thoughts on reading your post, however, are
- just how big is this graph ?
- is it possible to optimise your loading procedure instead ?
I know LinkedIn have a vast graph of people and connections that is held in memory all the time and that takes several hours to reload. But I figure that's a truly exceptional case.
Save your graph to disk, then map it into memory with MappedByteBuffer. Both processes should use the same memory, which will be shared with the page cache.
If is expensive to build your graph maybe you can serialize the object.
ByteArrayOutputStream bos = new ByteArrayOutputStream();
ObjectOutputStream out = new ObjectOutputStream(bos);
out.writeObject(graph);
out.flush();
byte b[] = bos.toByteArray();
//you can use FileOutputStream instead of a ByteArrayOutputStream
Then you can build your object from the file
ByteArrayInputStream inputBuffer = new ByteArrayInputStream(b);
ObjectInputStream inputStream = new ObjectInputStream(inputBuffer);
try {
Graph graph = (Graph) inputStream.readObject();
} finally {
if (inputStream != null) {
inputStream.close();
}
}
Just replace the ByteArrayInputStream with a FileInputStream
Two JVMs sounds more complicated than it needs to be. Have you considered doing a kind of "hot deploy" setup, where your main program loads up the graph, displays the UI, and then asks for (or automatically looks for) a jar/class file to load that contains your actual algorithm code? That way your algorithm code would be running in the same jvm as your graph, but you wouldn't have to reload the graph just to reload a new algorithm implementation.
UPDATE to address OP's question in comment:
Here's how you could structure your code so that your algorithms would be swappable. It doesn't matter what the various algorithms do, so long as they are operating on the same input data. Just define an interface like the following, and have your graph algorithms implement it.
public interface GraphAlgorithm {
public void doStuff(Map<whatever> myBigGraph)
}
If your algorithms are displaying results to some kind of widget, you could pass that in as well, or have doStuff() return some kind of results object.
Terracotta can help you with this. It allows you to share objects among several jvm instances.
Have you considered simply using a smaller amount of sample data for testing your algorithms?
Did you consider OSGi platform? It lives in a single JVM, but will allow you to upgrade bundles with algorithms without platform restart. Thus you may have a long-term running bundle with your huge data structures and short-term algorithm bundles taking access to the data.
Terracotta shares memory between many JVM instances so you can easily apply cluster to your system.