views:

1748

answers:

10

For a project we having a bunch of data that always have the same structure and is not linked together. There are two approaches to save the data:

  • Creating a new database for every pool (about 15-25 tables)
  • Creating all the tables in one database and differ the pools by table names.

Which one is easier and faster to handle for MySQL?

EDIT: I am not interessed in issues of database design, I am just interessed in which of the two possibilities is faster.

EDIT 2: I will try to make it more clear. As said we will have data, where some of the date rarely belongs together in different pools. Putting all the data of one type in one table and linking it with a pool id is not a good idea:

  • It is hard to backup/delete a specific pool (and we expect that we are running out primary keys after a while (even when use big int))

So the idea is to make a database for every pool or create a lot of tables in one database. 50% of the queries against the database will be simple inserts. 49% will be some simple selects on a primary key.

The question is, what is faster to handle for MySQL? Many tables or many databases?

+15  A: 

Why not create a single table to keep track of your pools (with a PoolID and PoolName as you columns, and whatever else you want to track) and then on your 15-25 tables you would add a column on all of them which would be a foreign key back to you pool table so you know which pool that particular record belongs to.

If you don't want to mix the data like that, I would suggest making multiple databases. Creating multiple tables all for the same functionality makes my spider sense tingle.

TheTXI
Seconded. It may be the data design is improper.
Blank Xavier
+1 multiple tables that do the same thing are usually a sign of a design that hasn't been thought through.
MatthieuF
You are right, but this is not the answer for my question. I asked for performance and not for database design.
Hippo
@MatthieuF - I think he's talking about "sharding" his db. Flickr, YouTube et al all do this. There comes a time when a single, fully normalized db simply can't handle the load placed upon it.
Paul Suart
A: 

Can't you partition your 15-25 tables?

tuinstoel
No. There are reason why we want to seperate the data.
Hippo
What reason(s) do you have?
tuinstoel
+9  A: 

If you don't want one set of tables with poolID poolname as TheTXI suggested, use separate databases rather than multiple tables that all do the same thing.

That way, you restrict the variation between the accessing of different pools to the initial "use database" statement, you won't have to recode your SELECTs each time, or have dynamic sql.

The other advantages of this approach are:

  • Easy backup/restore
  • Easy start/stop of a database instance.

Disadvantages are:

  • a little bit more admin work, but not much.

I don't know what your application is, but really really think carefully before creating all of the tables in one database. That way madness lies.

Edit: If performance is the only thing that concerns you, you need to measure it. Take a representative set of queries and measure their performance.

Edit 2: The difference in performance for a single query between the many tables/many databases model will be neglible. If you have one database, you can tune the hell out of it. If you have many databases, you can tune the hell out of all of them.

My (our? - can't speak for anyone else) point is that, for well tuned database(s), there will be practically no difference in performance between the three options (poolid in table, multiple tables, multiple databases), so you can pick the option which is easiest for you, in the short AND long term.

For me, the best option is still one database with poolId, as TheTXI suggested, then multiple databases, depending upon your (mostly administration) needs. If you need to know exactly what the difference in performance is between two options, we can't give you that answer. You need to set it up and test it.

With multiple databases, it becomes easy to throw hardware at it to improve performance.

MatthieuF
+4  A: 

I'm not too sure I completely understand your scenario. Do you want to have all the pools using the same tables, but just differing by a distinguishing key? Or do you want separate pools of tables within the one database, with a suffix on each table to distinguish the pools?

Either way though, you should have multiple databases for two major reasons. The first being if you have to change the schema on one pool, it won't affect the others.

The second, if your load goes up (or for any other reason), you may want to move the pools onto separate physical machines with new database servers.

Also, security access to a database server can be more tightly locked down.

All of these things can still be accomplished without requiring separate databases - but the separation will make all of this easier and reduce the complexity of having to mentally track which tables you want to operate on.

Josh Smeaton
+2  A: 

I don't know mysql very well, but I think I'll have to give the standard performance answer -- "It depends".

Some thoughts (dealing only with performance/maintenance, not database design):

  • Creating a new database means a separate file (or files) in the file system. These files could then be put on different filesystems if performance of one needs to be separate from the others, etc.
  • A new database will probably handle caching differently; eg. All tables in one DB is going to mean a shared cache for the DB, whereas splitting the tables into separate databases means each database can have a separate cache [obviously all databases will share the same physical memory for cache, but there may be a limit per database, etc].
  • Related to the separate files, this means that if one of your datasets becomes more important than the others, it can easily be pulled off to a new server.
  • Separating the databases has an added benefit of allowing you to deploy updates one-at-a-time more easily than with the single database.

However, to contrast, having multiple databases means the server will probably be using more memory (since it has multiple caches). I'm sure there are more "cons" for the multi-database approach, but I am drawing a blank now.

So I suppose I would recommend the multi-database approach. Obviously this is only with the understanding that there may very well be a better "database-designy" way of handling whatever you are actually doing.

Chris Shaffer
+2  A: 

Given the restrictions you've placed on it, I'd rather spin up more tables in the existing database, rather than having to connect to multiple databases. Managing connection strings TEND to be harder, in addition to managing the different database optimizations you may have.

aronchick
+2  A: 

FTR, in normal circumstances I'd take the approach described by TheTXI.

In answer to your specific question though, I have found it to be dependant on usage. (Cop out I know, but hear me out.)

A single database is probably easier. You'll have to worry about just one connection and would still have to specify tables. Multiple databases could, under certain conditions, be faster though.

If I were you I'd try both. There's no way we'll be able to give you a useful answer.

Tom Wright
+3  A: 

Differing the pools by table name or putting them in separate databases is about the same thing. However, if you have lots of tables in one database, MySQL has to load the table information and do a security check on all those tables when logging in/connecting.

As others mentioned, separate databases will allow you to shift things around and create optimizations specific to a certain pool (i.e. compressed tables). It is extra admin overhead, but there is considerably more flexibility.

Additionally, you can always "pool" the tables that are in separate databases by using federated or merge tables to simplify querying if needed.

As for running out of primary keys, you could always use a compound primary key if you are using MyISAM tables. For example, if you have a field called groupCode (any type) and another called sequenceId (auto increment) and create your primary key as groupCode+sequenceId. The sequenceId will increment based on the next unique ID within the group code set. For example: AAA 1 AAA 2 BBB 1 AAA 3 CCC 1 AAA 4 BBB 2 ...

Although with large tables you have to be careful about caching and make sure the file system you are using handles large files.

Brent Baisley
+5  A: 

In the situation you describe, experience has led me to believe that you'll find the separate databases to be faster when you have a large number of pools.

There's a really important general principle to observe here, though: Don't think about how fast it'll be, profile it.

chaos
Agreed. Make it work, then improve the performance.
Terry Lorber
+2  A: 

There should be no significant performance difference between multiple tables in a single database versus multiple tables in separate databases.

In MySQL, databases (standard SQL uses the term "schema" for this) serve chiefly as a namespace for tables. A database has only a few attributes, e.g. the default character set and collation. And that usage of GRANT makes it convenient to control access privileges per database, but that has nothing to do with performance.

You can access tables in any database from a single connection (provided they are managed by the same instance of MySQL Server). You just have to qualify the table name:

SELECT * FROM database17.accounts_table;

This is purely a syntactical difference. It should have no effect on performance.

Regarding storage, you can't organize tables into a file-per-database as @Chris speculates. With the MyISAM storage engine, you always have a file per table. With the InnoDB storage engine, you either have a single set of storage files that amalgamate all tables, or else you have a file per table (this is configured for the whole MySQL server, not per database). In either case, there's no performance advantage or disadvantage to creating the tables in a single database versus many databases.

There aren't many MySQL configuration parameters that work per database. Most parameters that affect server performance are server-wide in scope.

Regarding backups, you can specify a subset of tables as arguments to the mysqldump command. It may be more convenient to back up logical sets of tables per database, without having to name all the tables on the command-line. But it should make no difference to performance, only convenience for you as you enter the backup command.

Bill Karwin