ansaurus

Question

Trying to fix SQL query with two tables.

Answer 1

+1 A:

What's going wrong is that your second query only uses InHour, without referring to the EntryID. Also, your first query has its conditions completely independent from each other, which may not be a problem if your Hour table constraints are correct (the first column can never be null when the second is not null), but it's worth looking at.

In relational databases, it's best to get in the habit of thinking in terms of JOINs rather than IN(). Using IN() can often return the same results as a JOIN (with some differences in NULL handling) and often even gets the same execution plan, but it is #1 a "relaxed" way of thinking about the problem which doesn't lend itself well to the mental space needed for writing complex queries and #2 can't compare multiple values at once, it can only do a single comparison (at least in SQL Server, since some other DBMSes can do this).

Let me rewrite your queries as JOINs and maybe it will help you see what's wrong.

DELETE E
FROM
   dbo.Entry E
   INNER JOIN LINKEDSERVER.MYDATABASE.dbo.Entry L ON E.EntryID = L.EntryID
   INNER JOIN Hour H ON E.EntryID = H.EntryID
WHERE
   H.OutHour IS NOT NULL

DELETE H
FROM
   dbo.Hour H
   INNER JOIN LINKEDSERVER.MYDATABASE.dbo.Hour L ON H.InHour L.InHour
WHERE
   H.OutHour IS NOT NULL

I recommend you put a cascade delete foreign key constraint on the hour table so that when you delete from the Entry table, the child Hour rows all disappear. There are still problems here as you could have many Hour rows per EntryID and semantically you can end up trying to delete the same row over the linked server multiple times.

Also, be aware that huge joins over linked servers can experience very poor performance because sometimes the query engine decides to pull huge rowsets over the link, even entire tables. You can mitigate this by doing things in batches, perhaps by first doing a select into a temp table based on a JOIN across the link, then deleting corresponding rows in small batches of 100 or 1000 or 5000 (testing is in order to find the right size).

Last, if you do find that your queries are unnecessarily pulling huge sets of data over the link (determine this by running Query Profiler on the remote matchine to see what actual queries are being submitted), then strategic use of CROSS APPLY can help by forcing row-by-row processing, which in the case of linked servers can be an enormous performance improvement, despite how counter-intuitive that is compared to the standard and strong recommendation to never do row-by-row in relational databases. Think of it as forcing a "stretch bookmark lookup" rather than a "stretch table scan" and you'll get an inkling of why this can be such a big help.

Emtucifor 2010-09-22 16:27:39

Answer 2

A:

My very first suggestion is to put a foreign key relationship between the two on EntryID. This will prevent any deletions from the Entry table without first removing all instances from the Hour table.

Secondly, with a foreign key in place you have to do it from the child to the parent (aka, start at the bottom of the hierarchy). This means i would do this first:

delete from dbo.Hour where OutHour is not null
delete e
from dbo.Entry e
left outer join dbo.Hour h
on e.entryid=h.entryid
where h.entryid is null

DForck42 2010-09-22 17:55:27

ansaurus

tags:

views:

answers:

Trying to fix SQL query with two tables.

related questions