tags:

views:

220

answers:

9

I have to make a decision of which database server to use for my next project, but the simple decision to use MySQL like almost all the projects I did is harder now, because I expect very much records.

The database will store a user list, some other irrelevant tables, and the last one, some user-collected data. Let's say, if I have 6000 users responding to a quiz about each other. Simple math shows that from those users, if each one completes the quiz about everyone (and in my project that is 99% sure that will happen) I'll end up with 35.99million records(they will exclude themselves and in this particular situation the operation is 6000*5999). Unfortunately 6000 maybe is a small number, the real one growing day by day.

What to choose? MySQL and maybe if things go well and the project grows to expand it in a cluster? PostgreSQL, MSSQL? Oracle?

I've read about all of them, each one has it's pros and cons, but still don't know what to choose. The advantage of MySQL and PostgreSQL is of course, the starting price of $0 which is pretty nice in a usual self-funded startup.

Any opinions, pieces of advice? If you encountered this situation in your experience as developers, I'd love to hear from you.

A: 

35 million records can be easily handled by MS SQL Server (assuming proper database design, indices, etc.). You can start with the free SQL Server Express edition and later, if you need, you can upgrade to the full version which supports clustering, etc.

SQL Server Express does have some limitations - single CPU, 1 GB memory, max 4 GB database size and a few other things. I'm not sure how quickly these limitations will become a problem but you can always move to the full version when you run into them.

TLiebe
4 gig database size / 36 million rows = 119 bytes per row -- including indexes.
Frank Farmer
+1  A: 

Use MySQL as it's free and you have experience with it.

Besides in my opinion it matters more on how you design the tables than which database you use.

ReDAeR
Unfortunately the "design" aspect seems to have been lost in many cases as the role of a proper specialized DBA has been minimized with the commodity of database systems.
pst
A: 

MySQL(i) & Postgre

  • 0$ of costs
  • large community
  • many tutorials
  • well documentated

MSSQL

  • You can get "money" from MS if you promote that you are using MSSQL (secret information from some companies I worked for)
  • MS tools work very well
  • Complete tool set from C# IDE over .NET lib to Windows Server 2003

Oracle

  • Professional and commercial provider
  • Used by many large companies (I also heard about Blizzard (World of Warcraft) using Oracle)
  • - expensive

The final decision depends on the very special requirements of your project. Make yourself a quick list of things , that ARE IMPORTANT for your project (e.g. quick performed queries) and look up which Database pros are matching the most to your requirements.

Everything is about design. SQL Database are some kind of cars, you just have to know which component has to be placed here and which there. Make a clear design and you won't struggle with any of them.

daemonfire300
I use php, and usually don't trust Microsoft software.
Bogdan Constantinescu
I'm not a huge MS fan either, but lots of big projects run on MSSQL -- including SO
Frank Farmer
+3  A: 

MySQL will handle 35 million records no problem. Worry about scalability when you get there. You can easily add raid hard disks backing your database tables, and if you really start getting big you can get a compellant SAN that will scream... Don't worry about the DB engine as much as the underlying hardware.. MySQL rocks for us with millions of records.

Zak
Rocks with millions of records in a web type of usage? I mean, smooth, pretty fast and not making the user to have a bad opinion about the website?
Bogdan Constantinescu
Adding a caching layer can reduce database load dramatically as well. Depending on your read/write ratio, you may be able to avoid hitting the database most of the time.
Frank Farmer
Also, you can scale reads via replication. You can replicate all the data in your database to an unlimited number of slave databases and spread the read load across all the slave databases. You can ramp up the number of slaves as your traffic grows also. My point is, cross the bridge when you get there, MySql is capable of it.
Zak
Worry about scalability when you get there? I'm not sure I agree with this. Designing with scalability in mind isn't likely to cost you that much up front, but not doing so could cost you plenty later.
Aaron Bertrand
@Aaron: If this is your first time scaling big, you shouldn't worry about it. I can almost guarantee that you will get it wrong your first time around, and there's always the huge risk that you put in a ton of effort and it never takes off. On subsequent projects, you'll have a much better idea of where performance issues will crop up, and then your suggestion makes more sense.
Bob Aman
@Aaron - worrying about scalability when you you have no performance metric while you could be developing IS costing you time as opposed to the possibility that it MIGHT cost you time later. To clarify tho.. if you are developing and things are "too slow" stop and fix that. 999 times out of 1000 it will be a fix in your code as opposed to a fix by scalability of your database (during initial development I mean).
Zak
+2  A: 

I've had no problems handling tables as large as 36,000,000 rows on MySQL and Oracle.

Just be sure that you index the proper columns, run EXPLAINs for your queries, and maintain proper design principles.

Tenner
A: 

May be you can test Firebird

Blog post about big Firebird database here

MySQL licence is here (not allways free).

Postgresql and Firebird are free.

Hugues Van Landeghem
Why did this post receive a negative vote?
Murali VP
@Murali : yes, it's true. Why ?
Hugues Van Landeghem
+3  A: 

These days, free isn't something that differenciates between databases any more. Both Oracle and SQL Server have free versions, but the limitations is resources - 4 GB database, RAM & single CPU utilization. Millions of records is not a concern - it's what datatypes you're using.

I saw the OPs comment about not liking MS software - that's your prerogative, but using the free versions of either Oracle or SQL Server do benefit from seamless transition to upscale versions of the respective database.

Personally, my choice would be either Oracle or SQL Server because of IMHO, real feature considerations like hierarchical query support, subquery factoring/CTE, packages (long before I get concerned with functions/procedures), full text searching, xml support, etc.

OMG Ponies
Fortunately for me, the application itself isn't rocket science and it's build on Zend Framework. 80% of the data in a row will be a small int (1-20) probably, but the rest will be unfortunately text (I won't search through it, though)
Bogdan Constantinescu
Actually, I would argue it's not a prerogative <http://www.merriam-webster.com/dictionary/prerogative> but it is an opinion. Good post.
pst
@pst: Thanks. I meant "prerogative" as the OPs right to choose. There's a trailing ">" on your URL, mucking up the link btw
OMG Ponies
A: 

First of all, don't think about performance. Premature optimization being the root of all evil and all that. You can always throw more hardware and/or tuning at it later.

All of the mentioned should perform nicely if tuned/maintained correctly. I'd focus on manageability and familiarity. IMHO open source databases excels on manageability (perhaps not the best GUIs, but the CLI has been my home for a long long time).

And if the database becomes the bottleneck, why limit yourself to those choices? How about a key-value distributed database? Or perhaps serialize data directly to disk? Storing data outside of a RDBMS, while often frowned upon, might be the correct path. Or simply use the common route of denormalization.

Always remember not to optimize prematurely.

As far as opinions go (since you specifically asked for it) I favor open source databases, specifically PostgreSQL. It's rock solid, fast and very well-featured. And even with (relatively) large datasets it has performed superbly on mediocre hardware (some tuning involved, of course, but you can't skip that step no matter which db you end up choosing).

In database design you need to think about performance up front, databases are notorioulsy difficult to fix when not designed for performance.
HLGEM
A: 

Most of the truly large scale web properties use a distributed key-value store. That said, 35 million is large, but not that large. With most modern databases, your main two scaling worries should be throughput and what happens when no single box can contain your entire database anymore. And both of these problems can be solved to some degree for any database you choose to use. (Caching, replication, sharding, etc.)

Use MySQL until you can't anymore. At that point, you ought to be rolling in dough anyways and you now have a very desirable problem.

Bob Aman