views:

47

answers:

2

I have a question on the best method to get back to a piece of data that is in a related table on the other side of a many-to-many relationship table.

My first method uses joins to get back to the data, but because there are multiple matching rows in the relationship table, I had to use a TOP 1 to get a single row result.

My second method uses a subquery to get the data but this just doesn't feel right.

So, my question is, which is the preferred method, or is there a better method?

The script needed to create the test tables, insert data, and run the two queries is below.

Thanks for your advice!

Darvis

Create Tables

DECLARE @TableA TABLE (
[A_ID] [int] IDENTITY(1,1) NOT NULL,
[Description] [varchar](50) NULL)

DECLARE @TableB TABLE (
[B_ID] [int] IDENTITY(1,1) NOT NULL,
[A_ID] [int] NOT NULL,
[Description] [varchar](50) NOT NULL)

DECLARE @TableC TABLE (
[C_ID] [int] IDENTITY(1,1) NOT NULL,
[Description] [varchar](50) NOT NULL)

DECLARE @TableB_C TABLE (
[B_ID] [int] NOT NULL,
[C_ID] [int] NOT NULL)

Insert Test Data

INSERT INTO @TableA VALUES('A-One')
INSERT INTO @TableA VALUES('A-Two')
INSERT INTO @TableA VALUES('A-Three')

INSERT INTO @TableB (A_ID, Description) VALUES(1,'B-One')
INSERT INTO @TableB (A_ID, Description) VALUES(1,'B-Two')
INSERT INTO @TableB (A_ID, Description) VALUES(1,'B-Three')
INSERT INTO @TableB (A_ID, Description) VALUES(2,'B-Four')
INSERT INTO @TableB (A_ID, Description) VALUES(2,'B-Five')
INSERT INTO @TableB (A_ID, Description) VALUES(3,'B-Six')

INSERT INTO @TableC VALUES('C-One')
INSERT INTO @TableC VALUES('C-Two')
INSERT INTO @TableC VALUES('C-Three')

INSERT INTO @TableB_C (B_ID, C_ID) VALUES(1, 1)
INSERT INTO @TableB_C (B_ID, C_ID) VALUES(2, 1)
INSERT INTO @TableB_C (B_ID, C_ID) VALUES(3, 1)

Get result - method 1

SELECT TOP 1 C.*, A.Description
FROM @TableC C
JOIN @TableB_C BC ON BC.C_ID = C.C_ID
JOIN @TableB B ON B.B_ID = BC.B_ID
JOIN @TableA A ON B.A_ID = A.A_ID
WHERE C.C_ID = 1

Get result - method 2

SELECT C.*,               
(SELECT A.Description
FROM @TableA A
WHERE EXISTS (SELECT * 
      FROM @TableB_C BC
      JOIN @TableB B ON B.B_ID = BC.B_ID
      WHERE BC.C_ID = C.C_ID AND B.A_ID = A.A_ID))
FROM @TableC C
WHERE C.C_ID = 1
A: 

You don't list any index or PK info on your tables, so that is the first optimization to do, if they already don't have those.

It is impossible to tell you based on the limited test data, the optimizer will use statistics based on your actual data to determine index usage and the best query execution plan.

You can use SET SHOWPLAN_ALL ON within a SQL Server Management Studio query window to see the execution plan for each query and then compare them to see which is better.

For a quick check try this: run SET SHOWPLAN_ALL ON then the method 1 query, look at the TotalSubtreeCost for the first row. Then, run the method 2 query and again look at the TotalSubtreeCost for the first row. Which query had a lower value? If you want to make a more detailed assesment, you can look at the StmtText, EstimateIO, and EstimateCPU, as well as any of the other columns.

KM
Thanks for the response. I understand that it's difficult to say which will perform faster without looking at the real tables and data which these test tables were modeled after. I suppose I was more curious if either of the query methods raised a red flag or if someone had a better way to do the query. From your response it doesn't sound like either jumped out at you as ridiculous. Thanks again.
Darvis Lombardo
A: 

If you want to get a single row in the case where a row in one table may relate to multiple rows in another table then you need to define which of those rows to use. Once you have that defined then you can write a query to get that row.

In your case all the C row happens to point to all the same A row no matter which way you go through @TableB and @TableB_C, but that is certainly not enforced through the schema that you have.

Tom H.