views:

1542

answers:

5

Hello everyone,

I am using SQL Server 2008 Enterprise. And I am using Linked Server technologies to link another SQL Server 2008 Enterprise instance from another server. I write TSQL to manipulate objects (e.g. tables) from both server instances.

My question is, for linked server, is there a big performance issue? If yes, what is the key performance bottleneck and best practice we should follow?

thanks in advance, George

+1  A: 

You're going to take a bit of a hit to transfer the results across the wire twice (linked server to SQL Server to your machine). Secondly, it's got to resolve the name and log in, which isn't much of a hit, but it's a hit nonetheless.

Anyway, I've found the only major bottleneck is jumping servers, since it has to transmit the information twice.

Eric
Thanks Eric, 1. but confused about why " transfer the results across the wire twice" -- why transfer twice? For example, if I select from a table in a linked server, I think just need to transfer records from linked server to my server. So, only once, not twice. 2. "it's got to resolve the name and log in" -- only once or each time we communicate with linked server?
George2
@George: Assuming you're not doing queries on the server, you've got to transfer the results from the linked server to your server (one!), then from your server to the machine (two!). Any way you look at it, it's an extra hop. As for the connection, yes, it does stick, but sometimes folks have many people hitting a linked server, which would cause it to degrade performance. That's why I mentioned it, but only as a small hit.
Eric
Thanks Eric, 1. "to your server" -- server mean the database server which links another database server? "server to the machine" -- machine means the client who invokes services from database server? 2. "As for the connection, yes, it does stick" -- can you speak in some other words please? stick means? Sorry English is not my native language. :-)
George2
@George: 1: You're correct. 2: Persist, remain, stays constant. The session is the same for each user, so that they only have to connect once.
Eric
Thanks Eric, If I just do some queries/inserts from/to remote linked database server, could I understand in this simple way -- the only difference between local object and object on linked server is, we need network communication other than access local memory to access objects, for other things, almost the same?
George2
+5  A: 

It depends on what you are doing.

If you are running queries that join between tables in the two server instances, and transferring large amounts of data, then you have a bottleneck that you need to be aware of.

If the servers are on their own subnet with a 1GB link, then you should not have to worry a great deal. I would be concerned if the two servers are connected by a shared, slow link.

Mitch Wheat
Thanks Mitch, can I understand in this way, the only difference between local object and object on linked server is, we need network communication other than access local memory to access objects, for other things, almost the same?
George2
BTW: how about your answer to my question here, Mitch guru? :-)http://stackoverflow.com/questions/1248450/how-to-see-sql-server-cache-hit-rateI have posted my last question in the comments of your reply.
George2
+7  A: 

My question is, for linked server, is there a big performance issue? If yes, what is the key performance bottleneck and best practice we should follow?

Compared to what? For what queries? of course it all depends on what you are doing. For some queries the performance hit will be negligible for others massive.

There are a bunch of concerns you should keep in mind:

  • If you will be joining 2 tables from DB1 to 2 tables from DB2, and the tables are big, stuff can get ugly. At the end of the day, the queries will execute somewhere. The db will have to pull all the results in to the main DB and maintain transactional integrity on the main db. This can be really expensive.
  • If you start running distributed transactions stuff can get ugly, fast.
  • When joining stuff across servers your indexes on the remote server can be rendered useless. The data all has to move somewhere for the joins.
  • Linked server links can go down at unexpected times and lead to hard to diagnose bugs.

In the past I have found situations where it was a few orders of magnitude faster to move the remote data locally, and index it before joining into it.

Sam Saffron
1. "The db will have to pull all the results in to the main DB and maintain transactional integrity on the main db" -- I think SQL Server should maintain data integrity for both servers, why you just mention main db server here? 2. If I just do some queries/inserts from/to remote linked database server, could I understand in this simple way -- the only difference between local object and object on linked server is, we need network communication other than access local memory to access objects, for other things, almost the same?
George2
A: 

I use linked servers frequently to synchronise data between environments, mainly because I found it to be the easiest solution to code and manage.

One tip I found, but may not be an option for others, was to run any procedures on the server that has the most data or is doing the most updating/inserting. For example I have a procedure that compares two tables and inserts/updates from A to B. If I ran this on server A it would take many times longer than running the procedure on B. If you don't have a choice where to run our code, and you are stuck on, say, server A, then this advice may not help.

Another tip is to reduce the data returned to the minimum necessary. Whereas you might normally have data returned almost instantly on a local server, if a linked server is some distance away then the latency can be very painful. Be stricter than normal in accessing only those columns you need.

Raspin
A: 

I've found that if you're doing outer joins (left/right) the performance degrades fast. It's sometimes faster to select the data from the remote server into a temp table and index it rather than joining across the network. Mostly, the best strategy is to write the query the way it makes sense and then only tune it if performance is a real problem.

mattmc3