views:

257

answers:

5

TLDR: What are the pros/cons of using an in-memory database vs locks and concurrent data structures?

I am currently working on an application that has many (possibly remote) displays that collect live data from multiple data sources and renders them on screen in real time. One of the other developers have suggested the use of an in memory database instead of doing it the standard way our other systems behaves, which is to use concurrent hashmaps, queues, arrays, and other objects to store the graphical objects and handling them safely with locks if necessary. His argument is that the DB will lessen the need to worry about concurrency since it will handle read/write locks automatically, and also the DB will offer an easier way to structure the data into as many tables as we need instead of having create hashmaps of hashmaps of lists, etc and keeping track of it all.

I do not have much DB experience myself so I am asking fellow SO users what experiences they have had and what are the pros & cons of inserting the DB into the system?

+4  A: 

Well a major con would be the mismatch between Java and a DB. That's a big headache if you don't need it. It would also be a lot slower for really simple access. On the other hand, the benefits would be transactions and persistence to the file system in case of a crash. Also, depending on your needs, it allows for querying in a way that might be difficult to do with a regular Java data structure.

For something in between, I would take a look at Neo4j. It is a pure Java graph database. This means that it is easily embeddable, handles concurrency and transactions, scales well, and does not have all of the mismatch problems that relational DBs have.

Updated If your data structure is simple enough - a map of lists, map of maps, something like that, you can probably get away with either the concurrent collections in the JDK or Google Collections, but much beyond that, and you will likely find yourself recreating an in memory database. And if your query constraints are even remotely difficult, you're going to have to implement all of those facilities yourself. And then you'll have to make sure that they work concurrently etc. If this requires any serious complexity or scale(large datasets), I would definitely not roll your own unless you really want to commit to it.

If you do decided to go with an embedded DB there are quite a few choices. You might want to start by considering whether or not you want to go the SQL or the NoSQL route. Unless you see real benefits to go SQL, I think it would also greatly add to the complexity of your app. Hibernate is probably your easiest route with the least actual SQL, but its still kind of a headache. I've done it with Derby without serious issues, but it's still not straightforward. You could try db4o which is an object database that can be embedded and doesn't require mapping. This is a good overview. Like I had said before, if it were me if I would likely try Neo4j, but that could just be me wanting to play with new and shiny things ;) I just see it as being a very transparent library that makes sense. Hibernate/SQL and db4o just seems like too much hand waving to feel lightweight.

Russell Leggett
good point about the additional ease of persistance to the file system
yx
There's also a big win with the database direction by making the model of the data more transparent. It can be very complicated making changes to code that involves complex data structures. Whereas, it's far more obvious when the data is structured as tables and relationships. And with modern ORMs, i don't think database access is all that big a headache these days. Certainly far less a headache than hand-coding complex data structure concurrency. I'd choose the database route.
nicerobot
neo4j does look very promising now that I've read a little about it on their wiki.
yx
A: 

It is unclear to me why you feel that an in memory database cannot be thread safe.

Why don't you look at JDO and DataNucleus? They have a lot of different datastores where you get to plug in what your back end persistence provider is at run time as a configuration step. Your application code is dependent on an ORM but that ORM might be plugged into an RDBMS, DB40, NeoDatis, LDAP, etc. If one backend doesn't work for you, then switch to another.

Glenn
you misunderstand my question. I did not say that a DB is not thread safe, in fact that is one of the considerations for why to use a DB.
yx
+1  A: 

I was once working for a project which has been using Oracle TimesTen. This was back in early 2006 when Java 5 was just released and java.util.concurrent classes were barely known. The system we have developed had reasonably big scalability and throughput requirements (it was one of the core telco boxes for SMS/MMS messaging).

Briefly speaking, reasoning for TimesTen was fair: "let's outsource our concurrency/scalability problems to somebody else and focus on our business domain" and made perfect sense then. But this was back in 2006. I don't think such a decision would be made today.

Concurrency is hard, but so is handling of in-memory databases. Freeing yourself of concurrency problems you'd have to become an expert of in-memory database world. Fine tuning TimesTen for replication is hard (we had to hire a professional consultant from Oracle to do this). License(s) don't come for free. You also need to worry about additional layer which is not open source and/or might be written in a different language than the one you understand.

But it is really hard to make any judgement without knowing your experience, budget, time requirements, etc. Do a shopping around, spend some time for looking into decent concurrency frameworks (such as http://akkasource.org/) ...and let us know what you have decided ;)

mindas
+2  A: 

You could use something like Space4J and get the benefits of both a collections like interface and an in memory database. In practical use something as basic as a Collection is an in memory database with no index. A List is an in memory database with a single int index. A Map is an in memory database with a single index type T based index and no concurrency unless synchronized or a java.util.concurrency.* implementation.

fuzzy lollipop
A: 

Below are few questions which could facilitate a decision.

  • Queries - do you need to query/reproject/aggregate your data in different forms?
  • Transactions - do you ever need to rollback added data?
  • Persistence - do you only need to present the gathered data or do you also need to store it in some way?
  • Scalability - will your data always fit in the memory?
  • Performance - how fast should it be?
lexicore