views:

179

answers:

2

We are looking at using a NoSQL database system for a large project. Currently, we have read a bit about MongoDB and Cassandra, though we have absolutely no experience with either. We are very proficient with traditional relational databases like MySQL and Microsoft SQL, but the NoSQL (key/value store) is a new paradigm for us.

So basically, which NoSQL database do you guys recommend for our use?

We do both heavy writes and reads. Basically we have tens of thousands of devices that are reporting:

device_id (int), latitude (decimal), longitude (decimal), date/time (datetime), heading char(2), speed (int)

Every minute. So, at peak times we need to be able to process hundreds of writes a second.

Then, we also have users, that are querying this information in the form of, give me all messages from device_id 1234 for the last day, or last week. Also, users do other querying like, give me all messages from device_1234 where speed is greater than 50 and date is today.

So, our initial thoughts are that MongoDB or Cassandra are going to allow us to scale this much easier then using a traditional database.

A document or value in MongoDB or Cassandra for us, might look like:

{
   device_id: 1234,
   location: [-118.12719739973545, 33.859012351859946],
   datetime: 1282274060,
   heading: "N",
   speed: 34
}

Which system do you guys recommend? Thanks greatly.

+3  A: 

MongoDB has built-in support for geospatial indexes: http://www.mongodb.org/display/DOCS/Geospatial+Indexing

As an example to find the 10 closest devices to that location you can just do

db.devices.find({location: {$near: [-118.12719739973545, 33.859012351859946]}}).limit(10)
mstearn
A strong reason why we ditched mysql.
luckytaxi
Thanks greatly for the reply, seems like the builtin MongoDB geospatial is going to be very useful.
Justin
Where is a good starting part for us to read, and any tutorials videos? Also, we want to allows users to store metadata for each update. What would be a good data structure for this? So, something like: { device_id: 1234, location: [-118.12719739973545, 33.859012351859946], datetime: 1282274060, heading: "N", speed: 34, metadata: { key: "value", key: "value", key: "value } }
Justin
Take a look at http://github.com/RedBeard0531/Mongo_Presentations/blob/master/20100521-mongony/geospatial_script.js#L55-97. It was used as part of this talk http://blip.tv/file/3681531. Note that if you want to use the new spherical distance calculations (http://www.mongodb.org/display/DOCS/Geospatial+Indexing#GeospatialIndexing-NewSphericalModel), you'll need to reverse the coordinates and put X/longitude first (as you did in your example).
mstearn
A: 

I have done some work with mongodb and geospatial data, but not on the scale mentioned above. The geospatial searches are very fast, much more so than mysql.

I suggest looking into mongodb's sharding, replication, and clustering functionality to deal with the volume of writes. Sharding across device identifier may be a good way to deal with the write volume. If you're interested in proximity of events then sharding across lat/lng may be more appropriate.

jack

Jack Cox