views:

168

answers:

2

I need to set up a MySQL environment that will support adding many unique databases over time (thousands, actually). I assume that at some point I will need to start adding MySQL servers, and would like my environment to be prepared for the case beforehand, to make the transition to a 2nd, 3rd, 100th server easy.

And just to make it interesting, It would be very convenient if the solution was modeled so the application that queries the databases sends all the queries to a single address and receives a result. It should be unaware of the number and location of the servers. The database name is unique and can be used to figure out which server holds the database.

I've done some research, and MySQL Proxy pops out as the main candidate, but I haven't been able to find anything specific about making it perform as described above.

Anyone?

+3  A: 

Great question. I know of several companies that have done this (Facebook jumps out as the biggest). None are happy, but alternatives kind of suck, too.

More things for you to consider -- what happens when some of these databases or servers fail? What happens when you need to do a cross-database query (and you will, even if you don't think so right now).

Here's the FriendFeed solution: http://bret.appspot.com/entry/how-friendfeed-uses-mysql

It's a bit "back-asswards" since they are basically using MySQL as a glorified key-value store. I am not sure why they don't just cut out the middleman and use something like BerkeleyDB for storing their objects. Connection management, maybe? Seems like the MySQL overhead would be too high a price to pay for something that could be added pretty easily (famous last words).

What you are really looking for (I think) is a distributed share-nothing database. Several have been built on top of open-source technologies like MySQL and PostgreSQL, but none are available for free. If you are in the buying mood, check out these companies: Greenplum, AsterData, Netezza, Vertica.

There is also a large number of various distributed key-value storage solutions out there. For lack of a better reference, here's a starting point: http://www.metabrew.com/article/anti-rdbms-a-list-of-distributed-key-value-stores/ .

SquareCog
lets say that I don't need to do cross-database queries. Each database is an island of it's own, it's just that: a. the application needs to be able to query all databases from one single point, and b. I need to ability to add new databases at will.Does that change the picture?
Omer
oh, and thanks for the very quick and detailed response.
Omer
The simple solution is to write a wrapper for your current database connection module, and have it maintain a pool of connectors -- when you ask for a db handle, give it a way to find the right db. This works for a set number of dbs and a static mapping; there are obvious problems with dynamic mappings.And I wasn't really asking if you need cross-db queries; I was telling you that if your project grows, you WILL need cross-db queries. Maybe not a lot of them, but you'll need them.
SquareCog
+2  A: 

Your problem sounds similar to one we faced - that you are acting as a white-label, and that each client needs to have their own separate database. Assuming this concept parallels yours, what we did was leverage a "master" database that stored the hostname and database name for the client (which could be cached in the application tier). The server the client was accessing could then dynamically shift its datasource to the required database. This allowed us to scale up to thousands of client databases, scattered across servers.

Rich Kroll