views:

450

answers:

5

Right now I'm developing the prototype of a web application that aggregates large number of text entries from a large number of users. This data must be frequently displayed back and often updated. At the moment I store the content inside a MySQL database and use NHibernate ORM layer to interact with the DB. I've got a table defined for users, roles, submissions, tags, notifications and etc. I like this solution because it works well and my code looks nice and sane, but I'm also worried about how MySQL will perform once the size of our database reaches a significant number. I feel that it may struggle performing join operations fast enough.

This has made me think about non-relational database system such as MongoDB, CouchDB, Cassandra or Hadoop. Unfortunately I have no experience with either. I've read some good reviews on MongoDB and it looks interesting. I'm happy to spend the time and learn if one turns out to be the way to go. I'd much appreciate any one offering points or issues to consider when going with none relational dbms?

+1  A: 

What do you think is a significant amount of data? MySQL, and basically most relational database engines, can handle rather large amount of data, with proper indexes and sane database schema.

Why don't you try how MySQL behaves with bigger data amount in your setup? Make some scripts that generate realistic data to MySQL test database and and generate some load on the system and see if it is fast enough.

Only when it is not fast enough, first start considering optimizing the database and changing to different database engine.

Be careful with NHibernate, it is easy to make a solution that is nice and easy to code with, but has bad performance with large amount of data. For example whether to use lazy or eager fetching with associations should be carefully considered. I don't mean that you shouldn't use NHibernate, but make sure that you understand how NHibernate works, for example what "n + 1 selects" -problem means.

Juha Syrjälä
Thanks for your points. I do think the same about MySql and I believe it should be good enough for some months but I really like to hear the case MongoDB users can make against MySql.On Nhibernate, I too thought the same thing, however I realized that in order to fully benefit from the goody that is NHibernate, you must always consider how each of your queries are performed.
Am
+1  A: 

Measure, don't assume.

Relational databases and NoSQL databases can both scale enormously, if the application is written right in each case, and if the system it runs on is properly tuned.

So, if you have a use case for NoSQL, code to it. Or, if you're more comfortable with relational, code to that. Then, measure how well it performs and how it scales, and if it's OK, go with it, if not, analyse why.

Only once you understand your performance problem should you go searching for exotic technology, unless you're comfortable with that technology or want to try it for some other reason.

Andrew McGregor
Andrew, correct me if I'm wrong, but I feel regardless of how well the code is written, when dealing with a large database, the first thing to give is usually RDMS when performing joins. This is one of the reasons for why Facebook and Google don't store their data in MySql.
Am
@Am, RDMS join performance may or may not become problem with your data and situation, but you will not know it if you do not measure and benchmark it. Big boys do not use MySQL, but then again they probably have several magnitudes more data than you.
Juha Syrjälä
@Am part of my responsibility is tool support for a large company, which has chosen to use Enterprise Architect with MySQL as a back end. EA has a habit of combining lots of different data in strings and putting this into a generic 'xref' table. Every important operation in the tool is CPU bound at the client, presumably in string parsing or concatenation. Being in the position of being database limited exceeds the data management capability of almost every product I've seen. Your 'regardless of how well the code is written' ignores plenty of code which is worse than you can imagine.
Pete Kirkham
@Pete Kirkham, I admit that my statement makes the assumption that as programmers we aim to write the best code we can. But I can see that (best code we can) is subjective :) So far everyone is suggesting to perform tests before you move on to Document oriented DBS, but I'm simply interested in knowing if MongoDB or another Document oriented DB system is at all the way to go. I know that I may not need it, but is it a bad decision to go with MongoDB from the start and skip MySQL?
Am
+5  A: 

So far no one has mentioned PostgreSQL as alternative to MySQL on the relational side. Be aware that MySQL libs are pure GPL, not LGPL. That might force you to release your code if you link to them, although maybe someone with more legal experience could tell you better the implications. On the other side, linking to a MySQL library is not the same that just connecting to the server and issue commands, you can do that with closed source.

PostreSQL is usually the best free replacement of Oracle and the BSD license should be more business friendly.

Since you prefer a non relational database, consider that the transition will be more dramatic. If you ever need to customize your database, you should also consider the license type factor

Francisco Garcia
+1 and if NoSQL is very compelling case, just use Postgres with NoSQL architecture http://momjian.us/main/blogs/pgblog/2010.html
Michael Buen
+9  A: 

The other answers here have focused mainly on the technical aspects, but I think there are important points to be made that focus on the startup company aspect of things:

  • Availabililty of talent. MySQL is very common and you will probably find it easier (and more importantly, cheaper) to find developers for it, compared to the more rarified database systems. This larger developer base will also mean more tutorials, a more active support community, etc.
  • Ease of development. Again, because MySQL is so common, you will find it is the db of choice for a great many systems / services. This common ground may make any external integration a little easier.
  • You are preparing for a situation that may never exist, and is manageable if it does. Very few businesses (nevermind startups) come close to MySQL's limits, and with all due respect (and I am just guessing here); the likelihood that your startup will ever hit the sort of data throughput to cripple a properly structured, well resourced MySQL db is almost zero.

Basically, don't spend your time ( == money) worrying about which db to use, as MySQL can handle a lot of data, is well proven and well supported.

Going back to the technical side of things... Something that will have a far greater impact on the speed of your app than choice of db, is how efficiently data can be cached. An effective cache can have dramatic effects on reducing db load and speeding up the general responsivness of an app. I would spend your time investigating caching solutions and making sure you are developing your app in such a way that it can make the best use of those solutions.

FYI, my caching solution of choice is memcached.

MatW
Huge +1. Just build a killer app. RDBMS or not, this is not what is going to give you a competitive advantage (and users don't give a sh!t about it).
Pascal Thivent
+1  A: 

I'd suggest you try out each db and pick the one that makes it easiest to develop your application. Go to http://try.mongodb.org to try MongoDB with a simple tutorial. Don't worry as much about speed since at the beginning developer time is more valuable than the CPU time.

I know that many MongoDB users have been able to ditch their ORM and their caching layer. Mongo's data model is much closer to the objects you work with than relational tables, so you can usually just directly store your objects as-is, even if they contain lists of nested objects, such as a blog post with comments. Also, because mongo is fast enough for most sites as-is, you can avoid dealing the complexities of caching and generally deliver a more real-time site. For example, Wordnik.com reported 250,000 reads/sec and 100,000 inserts/sec with a 1.2TB / 5 billion object DB.

There are a few ways to connect to MongoDB from .Net, but I don't have enough experience with that platform to know which is best:

Disclaimer: I work for 10gen on MongoDB so I am a bit biased.

mstearn