views:

39

answers:

1

I'm trying to optimise a query for a SQLServerCE database running on a Windows Mobile device. The query is used to identify rows to delete in one table based on no longer being referenced from a different table. The original query used a DISTINCT to identify matching rows:

SELECT TestGroupId, TestNameId, ServiceTypeId
FROM ServiceTypeInspection
WHERE ServiceTypeId NOT IN
    (SELECT DISTINCT ServiceTypeId
     FROM PurchaseOrder)

A quick google suggested using EXISTS instead via:

SELECT TestGroupId, TestNameId, ServiceTypeId
FROM ServiceTypeInspection AS STI
WHERE NOT EXISTS
    (SELECT PurchaseOrderId
     FROM PurchaseOrder AS PO
     WHERE (ServiceTypeId = STI.ServiceTypeId))

But then I also found suggestions on SO and MSDN to replace both options with a LEFT JOIN that sounded promising:

SELECT TestGroupId, TestNameId, STI.ServiceTypeId ServiceTypeId
FROM ServiceTypeInspection STI
LEFT JOIN PurchaseOrder PO ON STI.ServiceTypeId = PO.ServiceTypeId
WHERE PO.ServiceTypeId IS NULL

When I run these queries as a batch showing the execution plan, the cost relative to batch of the original query is 21%, the NOT EXISTS version is 11%, with the LEFT JOIN responsible for the balance of 68%. SSMS doesn't indicate there are any missing indexes but the fastest, NOT EXISTS query has 25% cost in a table scan. I have non-unique indexes on ServiceTypeId in both tables. Am I reading the output of the execution plan correctly that in this case the NOT EXISTS version is the best approach? Is the table scan a potential bottleneck or is this normal for a non-unique index?

Table definition is:

CREATE TABLE [PurchaseOrder](
[PurchaseOrderId] [uniqueidentifier] NOT NULL,
[ServiceTypeId] [nvarchar](8) NOT NULL,
CONSTRAINT [PK_PurchaseOrder] PRIMARY KEY
(
  [PurchaseOrderId]
)
);

CREATE TABLE [ServiceTypeInspection](
[TestGroupId] [int] NOT NULL,
[TestNameId] [nvarchar](10) NOT NULL,
[ServiceTypeId] [nvarchar](8) NOT NULL,
CONSTRAINT [PK_ServiceTypeInspection] PRIMARY KEY
(
  [TestGroupId],[TestNameId]
)
);

CREATE INDEX IX_PurchaseOrder_ServiceTypeId ON [PurchaseOrder] (ServiceTypeId);
CREATE INDEX IX_ServiceTypeInspection_ServiceTypeId ON [ServiceTypeInspection] (ServiceTypeId);
+1  A: 

My guess is that the size of your NC indexes are relatively large compared to table size (large Primary key, which I'm assuming is the also default clustered index, meaning that you have large NC indexes), so the optimiser is deciding to table scan.

Also, if the indexes IX_PurchaseOrder_ServiceTypeId and IX_ServiceTypeInspection_ServiceTypeId have less than approximately 10% selectivity, the optimiser will probably not use them.

Mitch Wheat
Thanks Mitch, as usual it's about the context right? Would that explain why the MS recommended optimisation of using a JOIN actually has a greater cost than the other options?
David Clarke