ansaurus

Question

How can I share memory between two JVM instances?

Answer 1

A:

Using RMI perhaps? Have one instance working as server and the rest as clients?

I think it would be much more complicated than reloading from disk.

OscarRyz 2009-07-28 17:49:44

Answer 2

A:

You can certainly create an interface onto it and expose it via (say) RMI.

My initial thoughts on reading your post, however, are

just how big is this graph ?
is it possible to optimise your loading procedure instead ?

I know LinkedIn have a vast graph of people and connections that is held in memory all the time and that takes several hours to reload. But I figure that's a truly exceptional case.

Brian Agnew 2009-07-28 17:52:03

Answer 3

+7 A:

Save your graph to disk, then map it into memory with MappedByteBuffer. Both processes should use the same memory, which will be shared with the page cache.

bdonlan 2009-07-28 17:52:49

but then I don't really need two JVMs, right? I have the graph serialized, yet loading from disk and deserializing is about 5-7 minutes; although I suspect it's help in Linux caches anyways. So how should one manage the memory to be shared by two processes here?

Alexy 2009-07-28 22:10:21

Actually, I serialize the graph to load it from disk, so I wonder how should that serialization interact with the MappedByteBuffer?

Alexy 2009-07-29 00:43:52

You would need to write it out in a format where you can reasonably use it directly out of the buffer - ie, without deserializing.

bdonlan 2009-07-29 01:46:18

Answer 4

A:

If is expensive to build your graph maybe you can serialize the object.

ByteArrayOutputStream bos = new ByteArrayOutputStream();
     ObjectOutputStream out = new ObjectOutputStream(bos);
     out.writeObject(graph);
     out.flush();
     byte b[] = bos.toByteArray();
//you can use FileOutputStream instead of a ByteArrayOutputStream

Then you can build your object from the file

ByteArrayInputStream inputBuffer = new ByteArrayInputStream(b);
     ObjectInputStream inputStream = new ObjectInputStream(inputBuffer);
     try {
      Graph graph = (Graph) inputStream.readObject();

     } finally {
      if (inputStream != null) {
       inputStream.close();
      }
     }

Just replace the ByteArrayInputStream with a FileInputStream

Dani Cricco 2009-07-28 17:55:36

I serialize the graph already, but deserializing it takes 5-7 minutes.

Alexy 2009-07-28 22:13:12

Answer 5

+3 A:

Two JVMs sounds more complicated than it needs to be. Have you considered doing a kind of "hot deploy" setup, where your main program loads up the graph, displays the UI, and then asks for (or automatically looks for) a jar/class file to load that contains your actual algorithm code? That way your algorithm code would be running in the same jvm as your graph, but you wouldn't have to reload the graph just to reload a new algorithm implementation.

UPDATE to address OP's question in comment:

Here's how you could structure your code so that your algorithms would be swappable. It doesn't matter what the various algorithms do, so long as they are operating on the same input data. Just define an interface like the following, and have your graph algorithms implement it.

public interface GraphAlgorithm {
  public void doStuff(Map<whatever> myBigGraph)
}

If your algorithms are displaying results to some kind of widget, you could pass that in as well, or have doStuff() return some kind of results object.

Peter Recore 2009-07-28 20:20:27

This is interesting. What I want to do with the graph is not that fixed though; it's a few million nodes/edges and I want to walk it, flow through it, etc. Now which APIs would I use to dynamically apply methods from a jar, and how flexible is it?

Alexy 2009-07-28 22:11:56

You'd use the Java reflection API - basically, you can load an arbitrary JAR or set of JARs, find a class in it, instantiate it, and invoke methods on it (or invoke static methods without instantiation). It's a bit heavyweight to actually do the call, but you'll be spending all your time inside there so it shouldn't be a problem.

bdonlan 2009-07-29 01:47:25

OK -- so how do I set up a procedure whereby a running app checks regularly whether there's a new jar with algorithms to be run, load it, and runs it?

Alexy 2009-07-29 06:05:32

Answer 6

+5 A:

Terracotta can help you with this. It allows you to share objects among several jvm instances.

Daniel Ribeiro 2009-07-28 20:26:17

I've found terracotta to be unsuited using *deep* collections (e.g. a map of maps) due to the way it decides to swap values in and out of memory

oxbow_lakes 2009-07-28 20:28:46

+1 for allowing VMs from different servers to participate, and Terracotta integration isn't too invasive.

Steve Reed 2009-07-28 20:29:49

Interesting -- but indeed the graph is a Map of Maps.

Alexy 2009-07-28 22:12:34

Answer 7

A:

if the problem is just to dynamicly load and run your code without name clashes a custom class loader could be enough. for a new run just cache all class files in a new classloader.

2009-07-28 23:34:41

Answer 8

A:

Have you considered simply using a smaller amount of sample data for testing your algorithms?

Nick Lewis 2009-07-28 23:36:21

that's doable too, but exploratory runs on all data is also preferable when the graph is readily available

Alexy 2009-07-29 01:39:15

Answer 9

+1 A:

Did you consider OSGi platform? It lives in a single JVM, but will allow you to upgrade bundles with algorithms without platform restart. Thus you may have a long-term running bundle with your huge data structures and short-term algorithm bundles taking access to the data.

Alexander Azarov 2009-07-29 15:47:46

that's finally a good reason to take a look at OSGi

Alexy 2009-07-29 17:45:12

Answer 10

A:

Terracotta shares memory between many JVM instances so you can easily apply cluster to your system.

Firstthumb 2009-07-31 07:30:18

ansaurus

tags:

views:

answers:

How can I share memory between two JVM instances?

related questions