views:

213

answers:

2

I'm not terribly familiar with NoSQL systems, but I remember reading a while back that they are ideal to handle statistical data.

Since I'm about to start writing code that will record data like "how many users were registered on each day", I was thinking I could use this as an opportunity to learn more about NoSQL if it fits the bill.

If NoSQL is indeed ideal for this, could you provide me with some information as to why? And which specific systems are best suited for this particular need?

So, after the first answer, maybe it's helpful to clarify a bit more.

I currently have a PostgreSQL database from which I'll get the data. It will be very simple, and no calculations needed. For example, I'll just get a resultset with the amount of users registered each day for the past month (so it'll basically just be a set of value pairs for the date/users) and save that in another table/database.

Thanks!

+1  A: 

It kind of depends on what sorts of analysis you are going to be doing on these stats. If you are going to be doing a lot of different operations (averaging, summing, joining...) you may find NoSQL solutions to be more of a pain then they are worth.

However, if you are storing stats mostly for a display purpose, or for very specific analysis routines, NoSQL solutions start to shine.

If your data is small enough, stick with a SQL solution, which will give the benefit of a full query engine to work with, but if you have lots of values (one value a day is nothing, even if you were running for a million years), and are worried about storage size and performance, NoSQL options once again may be worth it.

If your data is semi-structured, take a look at CouchDB, which offers some rudimentary indexing and querying support, which could provide some basis for analysis routines. If you are storing individual values with very little structure, my best advice would be to take a look at Tokyo Cabinet and Tokyo Tyrant, which are absolutely incredible options for key-value storage.

LorenVS
Do you think a table with ~300k rows (just two columns) that's accessed about 7 times per minute on a worst case would be worth it or not?
Ivan
Absolutely not... 300k rows is more or less nothing for the average RDBMS... I'm assuming one row is a timestamp, which would probably be your clustered index, and any database engine will make childsplay of any queries you run.
LorenVS
Thanks! I guess I'll have to find a better excuse to play with a NoSQL system.
Ivan
Most of the times I've been able to justify going away from SQL solutions was when my dataset was greater than a billion records... Anything less than that you manage fairly well with an RDBMS
LorenVS
A: 

NoSQL systems tend to optimize the case where data is stored frequently, but accessed infrequently. In the case of statistics, you might gather lots of data from a (social) site frequently in small bits, which is optimized for. But retrieval and analysis might be slower... It of course depends on which "NoSql" System you decide to use.

lorn00
Certain NoSQL systems, but if you consider the cases of memcache/memcachedb, many of the systems are optimized for the exact opposite situation as well...
LorenVS
A lot of people use Hadoop to process the contents of NoSQL stores and do statistics.
BrianLy