views:

257

answers:

9

I have a bit of an architecture problem here. Say I have two tables, Teacher and Student, both of them on separate servers. Since this tables share a lot of data and functionality, I would like to use this inheritance scheme and create a People table; however, I would need tho keep the Teacher table and the People records relating Teacher in one server, and the Student table and the People records relating Student in another server. This was a requirement made by the lead developer, since we have too many (and I mean too many) records for Teacher and Student, and a single database containing all of the People would collapse. Moreover, the clients NEED to have them on separate servers (sigh*).

I would really like to implement the inheritance scheme, since a lot of the funcionality could be shared among the databases. Is there any possible way to do this? any other architecture that may suit this type of problem? I'm I just crazy?

--- EDIT ---

Ok, I don't really have Teachers and Students per se, I just used those names to simplify my explanation. Truth is, there are about 9 sub-tables that would inherit the super table, all of them in separate servers for separate applications, and no, I don't have this type of database, but we have pretty low end servers for the amount of transactions we have ;). You're right, my statements are a bit exagerated and I apologize for that, it was just to make you guys answer faster (sorry :P). The different servers are more of a business restriction than anything else (although the lead developer DID say that a common database to store the SuperTable would collapse under it's own weight -his words, not mine :S). Our clients don't like their information mixed with other clients information, so we must have their information on different servers -pretty stupid, but the decision-makers have spoken :(.

+1  A: 

I don't know for sure if this is possible with SQL Server specifically, but it smells like something that could be solved with clustering and tablespace partitioning.

What I wonder about is whether this is really a good requirement; it introduces a lot of technical complexity based on a pretty simple assertion that there's just too much data. Have you attempted to verify this? A simple test would be to create a simple schema and populate it with dummy data for the number of rows you expect in production. It would probably be in your best interest to perform this test before you go too far down the road to implement this 'requirement'.

By the way, the type of schema you linked to is an example of the class table inheritance pattern.

It would be possible for you to implement a domain model for this project where the common attributes of Teacher and Student are described by a Person interface or base class which the common operations are written against. If you plan to use stored procedures extensively, this might not be a useful option, but it's something to consider.

Paul Morie
A: 

I think Paul is correct - perhaps look at your hardware infrastructure rather than your DB schema.

Using clustering, proper indexing and possibly a data archive scheme should solve any performance problems. The inheritance scheme seems to be the best data model.

It is possible to split the data over multiple servers and keep the scheme, but I think you'd definitely have more performance problems than if you looked at clustering/proper indexing. By setting up linked servers, you can do cross-server queries.

e.g. Students query

SELECT * FROM SERVER_A.People.dbo.Persons P INNER JOIN SERVER_B.People.dbo.Students S ON P.PersonID = S.PersonID

--EDIT-- As Paul said, you could perform your database separation in your abstraction layer.

E.g. have your Student class extend your Person class. In your Person class constructor, have it connect to Server A to populate whichever fields are available. In your student class constructor, have it connect to Server B (the Person attributes will already be populated by the Person constructor).

Dane
+3  A: 

Under what assumption did you determine that you have too much data? I'm pretty sure you could list every teacher and student in the world, and not cause SQL Server any grief.

This seems like an arbitrary decision that is going to have significant impact on the complexity of any solution you design.

Take a look here - I'm sure you don't measure your database in anything close to the scale represented on this page, and many of these db's are running on SQL Server.

Aaron Alton
What, you're saying he doesn't have a 20-terrabyte database with 67 billion records and 11,000 transactions per minute? ;)
Rex M
if the hardware is low end, it doesn't matter how great the software is
KM
A: 

I'm with Aaron here (sup Aaron). Move the tables into a single database. SQL Server can easily handle billions of rows per table (I've done it on SQL 2000 6-7 years ago, so modern versions and modern hardware are no problem). As long as your tables are indexed correctly There probably haven't been enough students in all of time at every school in the world to overload SQL Server much less at a single school.

In this case your best practice would be to put the tables in the same database, on the same server and index them for better performance.

mrdenny
A: 

Too many records cause 'database collapse'? What kind of pot is that lead developer smoking? Potent stuff!

I would recommend you guys study partitioned tables first. Making an application distributed (which really the two server approach implies) is much much harder than you think and it does not provide scalability.

Remus Rusanu
Do you have more 'students' than stars in the sky? Cause the Sloan Digital Sky Survey data is stored on SQL Server: http://cas.sdss.org/dr7/en/
Remus Rusanu
A: 

Yep, I'd have to agree with the others here, and single database, single server is just fine. It is far easier and cheaper to scale up your hardware currently to support the workload than it will be to scale out to federated servers. I only know of one place that does federated servers and their workload is phenomenal.

Jonathan Kehayias
A: 

link the servers and create a view

SELECT
  FirstName
    ,LastName
    ....
  FROM server.database.owner.Teachers
UNION
  FirstName
    ,LastName
    ....
  FROM server.database.owner.Students
racer x
A: 

What kind of client are you using? If you're using a Java client, and are using ORM, you may want to look into Hibernate Shards.

Jack Leow
A: 

Besides all the good answers here that the assumptions behind the question are highly questionable, if I needed to do this seriously (and if I take the assumptions as true) I would compare what Oracle had to offer, because it is in this type of scenario that it shows a benefit (I say this from experience).

But on the core question, assuming that the assumptions you outline are true, I would not try to have a combined table. If teachers and students can't be in the same database, it is unlikely that their identifying information can, and if the amount of data is overwhelming, then putting it all in one table is worse.

What I suspect is that if the underlying assumptions are true it is because there is an anticipation of a lot of contention on the tables and a lot of connections and activity on the tables, causing a lot of locks. In that case, adding a Person table will make things worse.

All that being said, if you still really wanted to do it, then you can reference one database from another in queries, via linked databases.

But if the real issues is number of connections and contention and deadlocks around the tables, such a solution would make things worse.

EDIT: In response to those who question what advantage Oracle would bring to such a situation, one would be in the federated database area, where it is much more mature. Another would be in tables where you have a high amount of contention, it makes copies of the data in certain situations, and in general its model is more sophisticated when it comes to handling contention. For example scenarios where tables are read in longer running queries, causing a lot of potential read locks. Oracle helps you keep transactional integrity without having to lock on read. In MS-SQL, you have to resort to dirty reads.

MS-SQL is a fine database, but it has its limits (raw amounts of data without any particular parameters about volume of reads and writes is not really one of them, though, which makes the question strange). And given the stiff competition, the non-Enterprise version of Oracle is really close enough in price to be worth a look. It could end up costing you a lot later.

Of course, if you already purchased an MS-SQL license, the cost factor is larger for Oracle, so the benifits have to be more obvious.

Yishai
Sound comment, but can you be a little more specific than "I would compare what Oracle had to offer"? That kind of statement is far too vague to be useful, and I would question the cost/benefit ratio of an Oracle solution over a SQL Server solution in almost any respect.
Aaron Alton
Especially when the SQL Server License has already been purchased.
mrdenny