views:

105

answers:

8

Many database libraries come setup for multiple database connections - but I've never actually known of an scripting application that needed to connect to two databases during it's run. (compiled, daemon-running languages are a different matter).

I understand having database slaves so that you can spread the load out - but usually on startup only one of them is chosen to handle that scripts needs.

So why would a PHP or Ruby application need to connect to more than one database? Or rather, why would you split your data up among several databases?

The only thing I can think of is bad design from a slowly evolving system that started off in multiple separate parts.

+3  A: 

Are you talking about different physical database servers or different databases in the "schema" sense?

Regarding physical servers, If you're using MySQL replication you might write to a master and always read from a slave. This helps split the load among each database.

mmattax
duh, I forgot that you don't write to a slave! All writes need a master!
Xeoncross
+1  A: 

I have a site that connects with two databases. One powers the website content (CMS DB) the other drives a web application that runs within the site (large amounts of non-CMS data) In fact, the latter uses replication.

I don't feel that's bad design. If one set of data has no relation to the other, then it makes sense even from a pure organization perspective to house it in a separate DB. Otherwise, people would just put all their tables in one DB.

Jason McCreary
A: 

For added security, I always create two accounts for every database: a read-only account (good for SELECT) and a read-write account (for SELECT, UPDATE, INSERT, DELETE and whatever else I might need). On some pages, I may need to use both accounts, thus I will consume two connections for only one database.

Andrew
+1  A: 

I have a Ruby application that connects to multiple databases. One database contains user login credentials (which is shared between several other projects). Another database contains archived data that my application tracks and compares (that only my application accesses). Another database contains data regarding physical machine resources which my application uses to generate new data (these resources are used by several different applications). By splitting the data into multiple databases, different applications only access the data that they need to be accessing.

bta
+1. The common terms for this technique are *sharding* and *partitioning*. Large scale applications don't keep all of their data in a single database. e.g., if you had 500 terabytes of user data, how do you think you'd store it? It'd take a lot of very expensive hardware to make that all accessible from a single database.
Frank Farmer
+1  A: 

Well, reading from one and writing to another is a very common use case. It's easy and fun to write a data access layer that reads from one connection (reading from the slave), and writes to another (the master). A single script might make multiple reads before writing -- perhaps some lookups are necessary for validation, for instance.

Scripting languages are also frequently used for integration. You might have two off-the-shelf codebases, both of which want to maintain their own database. Your integration code might want to talk to both of them.

In general, you can usually design out of using more than one connection, but in general, I don't see anything fundamentally wrong with using connections to more than one database.

timdev
This solution isn't only fun (hmmm... in my experience, it was the opposite of fun), it's one of the basic step in expanding your scalability. There are a bunch of ways you can scale, but if you started on a single database handling a bunch of different needs, the first step is to move different types of data into separate databases, isolating high r/w tables in their own databases. The next step after this usually is the master-slave scenario @timdev outlines. Both are fairly easy to set up and are definitely NOT bad design.
Andrew
+2  A: 

The simple answer is "scalability".

The ready availability of replication and clustering in a number of database products makes multiple database use a definite 'this must be possible'. Any decent ORM should know how to connect to multiple databases as required.

But even when the main application doesn't connect to more than one, there will often be other needs that do. Report generation, either scripted or ad-hoc, often involve queries that run for a long time. These are best run on database replicants dedicated (and configured) for these queries so they don't disrupt the main application.

Another good use is a type of scripted processing. Many apps will have a regular process that needs to rummage through a large part of the database. Whislt updates obviously have to go to the master, the big read queries can be run off a replicant.

Of course, the obvious need is simple performance. I oversaw a webapp and database that grew from surviving comfortably on one MySQL databse on a 32-bit dual-core machine with 3Gb to needing two 8-core 64-bit servers with 8Gb. Once it reached this stage, it relied on the database handler directing traffic to both servers. We had a window of about 50 minutes in a day where it could survive on just one database.

staticsan
+2  A: 

It is all too frequently the case that some of the data you need is stored in The Wrong Database. Sometimes it's personnel records in a PeopleSoft (Oracle) database. Maybe it's Enterprise CRM data on Informix. Or some departmental database stored in MS SQL Server. Whatever it is, it's in a different database, but you still need access (hopefully read-only).

Unless your primary database is magic-based, it isn't going to be able to provide you with remote table access for every other database out there. (Most will only provide remote access to other databases of the same type, eg: MySQL->MySQL.) When that all too frequent situation occurs, you'll have no other option but to have multiple database connections, and be glad that your framework supports it.

Craig Trader
A: 

Other reasons to have multiple databases. We have one application that everyone can access. We also have client database that are very differnt from client to client. It is easier to maintain the application that all clients use (and which is maintained by a differnte team) if the client_specific data is separated out to their own databases. It is also easier to move the client to a new server when they become a large enterprise client rather than the smaller clietns who run on a server with many other clients.

Further there are types of data that are transactional and need to be in databases that are set to full recovery mode with full transaction logging. Other data is only populated from imports and does not need transactional logging and which might slow down the system as the log grew enough to handle the 10,000,000 record import. These are often split out to a separate databse so they can be in simple recovery mode as it si not necessary to recover data from the transaction log if there is a problem, it can be easily recoverd by re-running the import.

Then data is split out into datawarehouses which are optimized for data reporting not transactions. Again these reporting databases are usually separate databases (often on separate servers).

Then you have the databases for multiple different COTS applications (we have accounting databases, Credit Card transaction porcessing databases, HR databases, our project management database). A particular website might need to access more than one of these or transfer information from one to the other. Believe me vendors won't let you copy their database structure into one database to rule them all.

We have several hundred databases here on many differnt servers.

HLGEM