views:

60

answers:

4

Given the following (heavily simplified) tables:

create table Tags (
   TagId int Primary Key
)

create table OrderLines (
   Branch int,
   Station int,
   TransNo int,
   TagId int foreign key references Tags,
   primary key (Branch, Station, TransNo)
)

I need a list of Tags along with an OrderLine which references each Tag. I am expecting zero or one OrderLines to reference each Tag, but there is nothing in the database constraints to enforce this.

Given input like this:

 OrderLines                                     Tags
 Branch Station TransNo TagId                   TagId
 1      100     2345    1                       1
 1      100     2346    1                       2
 1      101     5223    2                       3
 3      100     6677    4                       4

I want to get output like this:

  TagId     Branch    Station    TransNo  
  1         1         100        2345     <-- it could list 2346 here, don't care
  2         1         101        5223     
  3         null      null       null
  4         3         100        6677

Note that although TagId 1 is referenced twice, my output only contains one of it. It doesn't matter which OrderLine is listed alongside it, but there must be only one instance of each tag in the output.

What is the most efficient way to do this?

I'm not able to alter the database schema.

A: 
SELECT Tags.TagID, Branch, Station, TransNo
    FROM Tags
    LEFT JOIN OrderLines ON Tags.TagID = OrderLines.TagID
    ORDER BY Tags.TagID

The left join will make sure that all tags get listed, even those that don't have any order line attached to them. The only flaw is that if more than one order line references a tag, the tag will be listed once for each order line. Some DBMS's (e.g. MySQL) allow you to solve this using a GROUP BY and still select columns you don't group by, but it's nonstandard and there is no guarantee which OrderLine you'll get. If you want this, you'll have to resort to subqueries, unions, temporary tables, or views (but since you can't change the schema, that last option is out).

tdammers
Yeah, that flaw is the killer. I'm also definitely stuck with MSSQL :P
Blorgbeard
A: 
   select
        T.tagid,
        O.branch,
        O.station,
        O.transno
    from orderlines O
    right join tags T on (t.tagid=O.tagid)
    group by t.tagid

Result:

  tagid branch  station transno
1   1   100 2345
2   1   101 5223
3   NULL    NULL    NULL
4   3   100 6677
vulkanino
Can't do that in sql server - branch, station, transno must be either in an aggregate or in the group by clause.
Blorgbeard
+2  A: 

You'd have to look at the execution plan to gauge efficiency

;WITH O AS
(
SELECT Branch, Station, TransNo, TagId,
ROW_NUMBER() OVER (PARTITION BY TagId ORDER BY TagId ) AS RN
FROM OrderLines
)
    SELECT T.TagID, O.Branch, O.Station, O.TransNo
        FROM Tags T
        LEFT JOIN O ON T.TagID = O.TagID and RN=1
Martin Smith
Thanks, this works well for me. Is there any reason you used a CTE over a subquery? Both appear to perform the same for me.
Blorgbeard
@Blorgbeard - They both have the same execution plan as well so just personal preference.
Martin Smith
+2  A: 
select t.TagId, t.Station, t.Branch, t.TransNo
from (
SELECT Station, Branch, TransNo, TagId, ROW_NUMBER() OVER (partition by TagId order by TagId) r
From OrderLines) t
WHERE r = 1
UNION ALL
SELECT TagId, NULL, NULL, NULL
from Tags
WHERE NOT EXISTS (Select 1 from OrderLines ol Where ol.TagId = Tags.Id)
Michael Pakhantsov
Looks interesting, but I think you have a couple of syntax errors.. I added an `over` after `rank()`, but still get "The ranking function "Rank" must have an ORDER BY clause".. I'm not very proficient with ranking functions, so not sure how to fix it.
Blorgbeard
I think you would need to change this to `Rank() OVER (partition by TagId ORDER BY TagId )` I did this and got the resulting execution plans when comparing with my answer http://img148.imageshack.us/img148/4105/executionplan.png
Martin Smith
@Blorgbeard. Yes, OVER was missing and 'order by' clause is required (Query changed). If you have any preferences which order lines should be selected for tag - change 'order by' list.also look to Martin Smith solution - it also works is same way. Do not know that is will be faster 'left join' or 'union' with 'not exists'
Michael Pakhantsov
@Michael - See link in my comment above for that. I did run them against an empty table though. It would be worth testing against real data to see how they got on. Also there is the same error in both my suggested solution and the implemented fix. You need to order by something unique `RANK()` will always return 1 at present as it takes account of ties (`ROW_NUMBER()` doesn't).
Martin Smith
@Martin Smith. You are absolutely right!! Rank and DENSE_RANK allow only group items. so here can be used only ROW_NUMBER() for selecting first order line. So you win :)
Michael Pakhantsov
Thanks for the effort you put in anyway Michael :)
Blorgbeard
@Blorgbeard, Thanks. but scary that mssql does not have function first_value (its exists in oracle, select first value in the "window"), then query could be shorter :) )
Michael Pakhantsov