views:

1245

answers:

9

I am currently in need of a high performance java storage mechanism.

This means:

1) I have 10,000+ objects with 1 - Many Relationship.

2) The objects are updated every 5 seconds, with the most recent updates persistent in the case of system failure.

3) The objects need to be queryable in a reasonable time (1-5 seconds). (IE: Give me all of the objects with this timestamp or give me all of the objects within these location boundaries).

4) The objects need to be available across various Glassfish installs.

Currently:

I have been using JMS to distribute the objects, Hibernate as an ORM, and HSQLDB to provide the needed recoverablity.

I am not exactly happy with the performance. Especially the JMS part of this.

After doing some Stack Overflow research, I am wondering if this would be a better solution. Keep in mind that I have no experience with what Terracotta gives me.

I would use Terracotta to distribute objects around the system, and something else need to give the ability to "query" for attributes of those objects.

Does this sound reasonable? Would it meet these performance constraints? What other solutions should I consider?

+1  A: 

I am currently working on writing the client for a very (very) fast Key/Value distributed hash DB that provides set + list semantics. The DB is C99 and requires GCC and right now I'm battling with good old Java network IO to break my current 30,000 get/sets per/sec barrier. Hope to be done within the week. Drop me a line through my account and I'll get back when its show time.

any updates on this?
Grasper
Sorry, I missed this. http://github.com/alphazero/jredis/tree/master
+4  A: 

I know it's not what you asked, but, you may want to start by switching from HSQLDB to H2. H2 is a relatively new, pure Java DB. It is written by the same guy who wrote HSQLDB and he claims the performance is much better. I'm using it for some time now and I'm very happy with it. It should be a very quick transition (add a Jar, change the connection string, create the database) so it's worth a shot.

In general, I believe in trying to get the most of what I have before rewriting the application in a different architecture. Try profiling it to identify the bottleneck first.

zvikico
+1  A: 

With such a high update rate, Lucene is almost definitely not what you're looking for, since there is no way to update a document once it's indexed. You'd have to keep all the object versions in the index and select the one with the latest time stamp, which will kill your performance.

I'm no DB expert, but I think you should look into any one of the distributed DB solutions that's been on the news lately. (CouchDB, Cassandra)

itsadok
+1  A: 

You don't say what vendor you are using for JMS, but I wouldn't surprise me if you have some bottle neck there. I couldn't get more than 100 messages a second from ActiveMq, and whatever I tried in terms of configuration of acknowledgment, queue size, etc we were unable to soak the CPU beyond a few percent.

The solution was to batch many queries into one JMS message. We had a simple class that either sent a batch of messages when it got to 200 queries or reached a timeout (we used 20ms), which gave us a dramatic increase in message throughput.

Rob
+1  A: 

Maybe you should take a look to: Prevayler.

Your objects are always in mem. The "changes" to your objects are persisted. From time to time you are able to take a snapshot: every object is persisted.

Banengusk
+2  A: 

At first, Lucene isn't your friend here. (read only)

Terracotta is to scale around at the Logical layer! Your problem seems not to be related to the processing logic. It's more around the Storage/Communication point.

  1. Identify your bottleneck! Benchmark the Storage/Logic/JMS processing time and overhead!
  2. Kill JMS issues with a good JMS framework (eg. ActiveMQ) and a good/tuned configuration.
  3. Maybe a distributed key=>value store is your friend. Try Project Voldemort!
  4. If you like to stay at Hibernate and HSQL, check out the Hibernate 2nd level cache and connection pooling (c3po, container driven...)!
Martin K.
A: 

Guaranteed messaging is going to be much slower than volatile messaging. Given every object is updated every few second, you might consider batching your updates (into say 500 changes or by time say 1-10 ms' worth), sending over volatile messaging, and batching your transactions. In this case you are more likely to be limited by bandwidth. Tuning your use case you may find smaller batch sizes also work efficiently. If bandwidth is critical (say you have a 10 MB connection or slower, then you could use compression over JMS)

You can achieve much higher performance with a custom solution (which also might be simpler) e.g. Hazelcast & JGroups are free (you can add a node(s) which does the database synchronization so your main app doesn't slow down). There are commercial products which handle in the order of half a million durable messages/sec.

Peter Lawrey
+2  A: 

Several Terracotta users have built systems like this in the past, so I can you tell you by proof of existence that it can be done. :)

Compass does have support for clustering with Terracotta so that might help you. I suspect you might get further faster by just being careful with how you create your clustered data structures.

Regarding your requirements and Terracotta:

1) 10k objects is quite small from a Terracotta perspective

2) 5 sec update rate doesn't seem like an issue. Might depend how many nodes there are and whether there is any natural partitioning you can take advantage of. All updates will be persistent.

3) 1-5 second query time seems quite easy. Building your own well-organized data structures for lookup is the tricky part. Obviously you want to avoid scanning all the data.

4) Terracotta currently supports Glassfish v1 and v2.

If you post on the Terracotta forums, you could probably get more Terracotta eyeballs on the problem.

Alex Miller
A: 

Terracotta + jofti = queryable persistent clustered data structures Search google for terracotta querymap or visit tusharkhairnar.blogspot.com for querymap blog You may want to integrate timasync as well to update your database. Database is is your system of record use terracotta as caching and database offloading mechanism you can even batch async updates to make it faster so that I'd db contains fairly recent data

Tushar tusharkhairnar.blogspot.com