What is the best way to query data from multilpe tables and databases?

views:

180

answers:

+2 Q:

What is the best way to query data from multilpe tables and databases?

I have 5 databases which represent different regions of the country. In each database, there are a few hundred tables, each with 10,000-2,000,000 transaction records. Each table is a representation of a customer in the respective region. Each of these tables has the same schema.

I want to query all tables as if they were one table. The only way I can think of doing it is creating a view that unions all tables, and then just running my queries against that. However, the customer tables will change all the time (as we gain and lose customers), so I'd have to change the query for my view to include new tables (or remove ones that are no longer used).

Is there a better way?

EDIT

In response to the comments, (I also posted this as a response to an answer):

In most cases, I won't be removing any tables, they will remain for historic purposes. As I posted in comment to one response, the idea was to reduce the time it takes a smaller customers (one with only 10,000 records) to query their own history. There are about 1000 customers with an average of 1,000,000 rows (and growing) a piece. If I were to add all records to one table, I'd have nearly a billion records in that table. I also thought I was planning for the future, in that when we get say 5000 customers, we don't have one giant table holding all transaction records (this may be an error in my thinking). So then, is it better not to divide the records as I have done? Should I mash it all into one table? Will indexing on customer Id's prevent delays in querying data for smaller customers?

+7 A:

I think your design may be broken. Why not use one single table with a region and a customer column?

If I were you, I would consider refactoring to one single table, and if necessary (for reverse compatibility for example), I would use views to provide the same info as in the previous tables.

Edit to answer OP comments to this post :

One table with 10 000 000 000 rows in it will do just fine, provided you use proper indexing. Database servers are built to cope with this kind of volume.

Performance is definitely not a valid reason to split one such table into thousands of smaller ones !

Brann 2009-04-15 15:05:10

The idea was that when customers query their own history (say one with only 10,000 records), they don't take the pain of querying 1,000,000,000 rows. If I were to add all records to one table, it would approach 1,000,000,000 records, that just seems unmanageable.

scottm 2009-04-15 15:10:12

no it's not. With proper indexing, this should work just fine. And if the clients needs direct access to the database (but not to the other clients data of course), you can handle that with views.

Brann 2009-04-15 15:11:47

+1 This system is broken. Don't write another line of code until you scrap this heap and have a "Come to Jesus" with your manager on why this sucks. Me? I'd pack up my desk and leave immediately.

Chris Ballance 2009-04-15 15:11:51

easy test to see if the design is good: do you change data or schema when a customer is added/deleted. if you change the schema then it is bad, if you change data then it is good. +1

KM 2009-04-15 15:17:33

@mike, the schema is always the same. Currently, a new table (with the same schema as the others) is added when a new customer is added. The new customers transactions go into the new table. The other tables stay the same.

scottm 2009-04-15 15:22:34

@scotty: adding a new table = modifying the *database* schema

Brann 2009-04-15 15:23:37

@Brann, understood. Thanks

scottm 2009-04-15 15:24:34

Is there going to be any effect when 1000's of customers are querying the same giant table, or SQL just that good? I understand, now, that the indexes will prevent single queries (for smaller customer) from running slower.

scottm 2009-04-15 15:28:22

Indeed, SQL is really that good. In one of my jobs, we had 1000s of customers, all using the same set of giant tables (using some variation on "WHERE client_id=?"). With proper indexing, this is much faster (not to mention ease-of-use and ease-of-maintenance) than UNIONing per-client tables.

Piskvor 2009-04-15 15:48:47

+1. Using views to simulate the existing tables while you migrate the rest of your code across is a good idea.

j_random_hacker 2009-04-15 15:49:00

@scotty: As a rough guide, "SELECT * WHERE some_indexed_field = 'blah'" takes logarithmic time -- if it takes 1ms to find the row(s) in a table with 1000 rows, it will take around 2ms for 2000 rows, 3ms for 4000, ..., ~20ms to find them in a table with 1 billion rows.

j_random_hacker 2009-04-15 15:57:06

+2 A:

Agree with Brann,

That's an insane DB Schema Design. Why didn't you go with (or is an option to change to) a single normalised structure with columns to filter by region and whatever condition separates each table within a region database.

In that structure you're stuck with some horribly large (~500 tables) unioned view that you would have to dynamically regenerate as regularly as new tables appear in the system.

Eoin Campbell 2009-04-15 15:08:36

There definitely is that option. This is still all in the planning phases, customers can't actually query the data yet.

scottm 2009-04-15 15:26:18

+2 A:

The architecture of this system smells like it needs a vastly different approach if there are a few hundred tables and each has the same schema

Why are you adding or removing tables at all? This should not be happening under any normal circumstances.

Chris Ballance 2009-04-15 15:08:47

2 solutions 1. write a stored procedure who build the view for you by parsing all table names in the 5 databases and build the view with union as you would do it by hand.

create a new database with one table and import each night per example all the records of all the tables in this one.

2009-04-15 15:21:24

Sounds like your stuck somewhere between a multi and single tenant database shema. Specifically your storing it as "light"multi-tenant (separate tables vs separate databases) but querying it as single-tenant, one query to rule them all.

In the short term have your data access layer dynamically pick the table to query and not union everything together for one uber query.

In the long term pick one approach and stick too it. One database and one table or many databases.

Here are some posts on the subject.

http://stackoverflow.com/questions/13348/what-are-the-advantages-of-using-a-single-database-for-each-client

http://msdn.microsoft.com/en-us/library/aa479086.aspx

jms 2009-04-15 15:32:42

Thanks for the comment, I'll check it out. At this point, I think I'm going to switch to a single table. I just couldn't wrap my head around the idea of querying billions of records.

scottm 2009-04-15 15:35:35

Billions of records can be a problem as well, you just have to pick you poison. good luck :)

jms 2009-04-15 16:05:33

ansaurus

tags:

views:

answers:

What is the best way to query data from multilpe tables and databases?

EDIT

related questions