tags:

views:

147

answers:

5

I have a problem where I need to assemble a Map whose eventual size is in the GBs (going on past 64GB) and I cannot assume that a user of the program will have this kind of monster machine hanging around. A nice solution would be to distribute this map across a number of machines to make a far more modest memory footprint per instance.

Does anyone know of a library/suite of tools which can perform this sharding? I do not care about replication or transactions; just spreading this memory requirement around.

+3  A: 

terracotta might be useful have a look here

http://www.terracotta.org/

its a clustered jvm will depend on how often you update the map i guess on how well it performs.

Paul Whelan
If you're building a commercial product, the open source licensing for Terracotta can be a pain mainly because of the attribution clause
Kevin
Licensing is not a problem as the project is open-source; Terracotta seemed a bit OTT plus I'm more worried about how well Terracotta can shared the data. eHcache though looks like it might be able to do it.
andeyatz
A: 

Must it be open source? If not, Oracle Coherence can do it.

Kevin
+1  A: 

I suggest that you start with hazelcast:

http://www.hazelcast.com/

It is open-source, and in my opinion it is very easy to work with, so it is the best framework for rapid prototyping.

As far as I as know, it performs faster than the commercial alternatives, so I wouldn't worry about performance either.
(I haven't formally benchmarked it myself)

Yoni
That is a bold claim
Kevin
hey kevin, you are right, that's why I quickly added the parenthetic remark :)
Yoni
I'll give hazelcast a look at cheers!
andeyatz
+1  A: 

You may be able to solve your problem by using a database instead, something like http://hsqldb.org/ may provide the functionality you need with the ability to write the data to disk rather than keeping the whole thing in memory.

I would definitely take a step back and ask yourself if a map is the right data structure for GBs of data.

Angelo Genovese
andeyatz
A: 

Gigaspaces Datagrid sounds like the kind of thing you are looking for. (Not free though)

Mark