Parallel query execution on multiple database servers (running Microsoft SQL Server)

views:

1396

answers:

+1 Q:

Parallel query execution on multiple database servers (running Microsoft SQL Server)

Is it possible to configure multiple database servers (all hosting the same database) to execute a single query simultaneously?

I'm not asking about executing queries using multiple CPUs simultaneously - I know this it possible.

UPDATE

What I mean is something like this:

There are two 2 servers: Server1 and Server2
Both server host database Foo and both instances of Foo are identical
I connect to Server1 and submit a complicated (lots of joins, many calculations) query
Server1 decides that some calculations should be made on Server2 and some data should be read from that server, too - appropriate parts of the query are sent to Server2
Both servers read data and perform necessary calculations
Finally, results from Server1 and Server2 are merged and returned to the client

All this should happen automatically, without need to explicitly reference Server1 or Server2. I mean such parallel query execution - is it possible?

UPDATE 2

Thanks for the tips, John and wuputah.

I am researching alternatives of increasing both availability and capacity of MOSS database backend. So what I'm looking for is some kind out-of-the-box SQL Server load balancing solution that would be transparent to the application, because I cannot modify the application in any way. I guess SQL Server has no such feature (and Oracle, as far as I understand it, does - it is RAC mentioned by wuputah).

UPDATE 3

A quote from the Top Tips for SQL Server Clustering article:

Let's start by debunking a common misconception. You use MSCS clustering for high availability, not for load balancing. Also, SQL Server does not have any built-in, automatic load-balancing capability. You have to load balance through your application's physical design.

+1 A:

Hi,

Yes I believe it is possible, well sort of, let me explain.

You need to look into and research the use of Distributed Queries. A distributed query runs across multiple servers and is typically used to reference data that is not stored locally.

http://msdn.microsoft.com/en-us/library/ms191440.aspx

For example, Server A may hold my Customers table and Server B holds my Orders table. It is possible using distributed queries to run a query that references both Server A and Server B, with each server managing the processing of its local data (which could incorporate the use of parallelism).

Now in theory you could store the exact same data on each server and design your queries specifically so that only certain table were referenced on certain servers, thereby distributing the query load. This is not true parallel processing however, in terms of CPU.

If your intended goal is to distribute the processing load of your application then the typical approach with SQL Server is to use Replication to distribute data processing across multiple servers. This method is also not to be confused with parallel processing.

http://databases.about.com/cs/sqlserver/a/aa041303a.htm

I hope this helps but of course please feel free to pose any questions you may have.

John Sansom 2009-02-16 09:03:33

+1 A:

What you're really talking about is a clustering solution. It looks like SQL Server and Oracle have solutions to this, but I don't know anything about them. I can guess they would be very costly to buy and implement.

Possible alternate suggestions would be as follows:

Use master-slave replication, and do your complex read queries from the slave. All writes must go to the master, which are then sent to the slave, so things stay in sync. This helps things go faster because the slave only has to worry about the writes coming from the master, which are already predetermined on behalf of the slave (no deadlocks etc). If you're looking to utilize multiple servers, this is the first place I would start.
Use master-master replication. This means that all writes from both servers go to each other, so they stay in sync (at least theoretically). This has some of the benefits as master-slave but you don't have to worry about writes going to one server instead of the other. The more common use of master-master replication is for failover support; master-slave is really better suited to performance.
Use the feature John Sansom talked about. I don't know much about it, but it seems its basis is splitting your database into tables on different servers, which will have some benefits as well as drawbacks. The big issue is that since the two systems can't share memory, they will have to share a lot of data over the network to compute complex joins.

Hope this helps!

RE Update 1:

If you can't modify the application, there is hope, but it might be a bit complicated. If you were to set up master-slave replication, you can then set up a proxy to send read queries to the slave(s) and write queries to the master(s). I've seen this done with MySQL, but not SQLServer. That's a bit of a problem unless you want to write the proxy yourself.

This has been discussed on SO previously, so you can find more information there.

RE Update 2:

Microsoft's clustering might not be designed for performance, but that's Microsoft fault. That's still the level of complexity you're talking about here. If they say it won't help, then your options are limited to those above and by what you do with your application (like sharding, splitting into multiple databases, etc).

wuputah 2009-02-16 11:48:24

+1 A:

Interesting question, but I'm struggling to get my head around this being beneficial for a multi-user system.

If I'm the only user having half my query done on Server1 and the other half on Server2 sounds cool :)

If there are two concurrent users (lets say with queries of identical difficulty) then I'm struggling to see that this helps :(

I could have identical data on both servers and load balancing - so I get Server1, my mate gets Server2 - or I could have half the data on Server1 and the other half on Server2, and each will be optimised, and cache, just their own data - spreading the load. But whenever you have to do a merge to complete a query the limiting factor becomes the pipe-size between them.

Which is basically Federated Database Servers. Instead of having all my Customers on one server and all my Orders on the other I could, say, have my USA customers and their orders on one, and my European customers/orders on the other, and only if my query spans both is there any need for a merge step.

Kristen 2009-02-16 12:28:06

ansaurus

tags:

views:

answers:

Parallel query execution on multiple database servers (running Microsoft SQL Server)

related questions