views:

37

answers:

2

Hi. I have a problem in T-SQL that I find difficult to solve.

I have a table with groups of records, grouped by key1 and key2. I order each group chronologically by date. For each record, I want to see if there existed a record before (within the group and with lower date) for which the field "datafield" forms an allowed combination with the current record's "datafield". For the allowed combinations, I have a table called AllowedCombinationsTable.

I wrote following code to achieve it:

WITH Source AS (
    SELECT key1, key2, datafield, date1,
        ROW_NUMBER() OVER(PARTITION BY key1, key2 ORDER BY date1 ASC) AS dateorder
        FROM table
)
SELECT L.key1, L.key2, L.datafield, DC.datafield2
FROM Source AS L
LEFT JOIN AllowedDataCombinationsTable DC
    ON D.datafield1 = L.datafield
LEFT JOIN Source AS R
    ON R.Key1 = L.Key1
    AND R.Key2 = L.Key2
    AND R.dateorder < L.dateorder
    AND DC.datafield2 = L.datafield
    -- AND "pick the one record with lowest dateorder"

Now for each of these possible combination records, I want to pick the first one (see placeholder in code). How can I do it most efficiently?


EDIT: OK let's say for the source, only showing group (1, 1):

**Key1 Key2 Datafield Date DateOrder**
1 1 "Horse" 1-Jan-2010 1
1 1 "Horse" 2-Jan-2010 2
1 1 "Sheep" 3-Jan-2010 3
1 1 "Dog" 4-Jan-2010 4
1 1 "Cat" 5-Jan-2010 5

AllowedCombinationsTable:

**Datafield1 Datafield**
Cat Sheep (and Sheep Cat)
Cat Horse (and Horse Cat)
Dog Horse (and Horse Dog)

After my join I have now:

**Key1 Key2 Datafield Date DateOrder JoinedCombination JoinedCombinationDateOrder**
1 1 "Horse" 1-Jan-2010 1 NULL NULL
1 1 "Horse" 2-Jan-2010 2 NULL NULL
1 1 "Sheep" 3-Jan-2010 3 NULL NULL
1 1 "Dog" 4-Jan-2010 4 "Horse" 1
1 1 "Dog" 4-Jan-2010 4 "Horse" 2
1 1 "Cat" 5-Jan-2010 5 "Horse" 1
1 1 "Cat" 5-Jan-2010 5 "Horse" 2
1 1 "Cat" 5-Jan-2010 5 "Sheep" 3

I want to display only the first "Horse" for record 4 "Dog", and also only the first "Horse" for record 5 "Cat".

Get it? ;)

A: 

I think this may do it--don't have data set up to test the query with. Check the comments for rationale.

WITH Source AS ( 
    SELECT key1, key2, datafield, date1, 
        ROW_NUMBER() OVER(PARTITION BY key1, key2 ORDER BY date1 ASC) AS dateorder 
        FROM table 
) 
SELECT L.key1, L.key2, L.datafield, DC.datafield2 
FROM Source AS L 
LEFT JOIN AllowedDataCombinationsTable DC 
    ON DC.datafield1 = L.datafield   --  DC Alias
LEFT JOIN Source AS R 
    ON R.Key1 = L.Key1 
    AND R.Key2 = L.Key2 
    AND DC.datafield2 = R.datafield   --  Changed alias from L to R
    AND R.dateorder = 1               --  Pick out lowest one
    AND R.dateorder < L.dateorder     --  Make sure it's not the same one
Philip Kelley
The L was a typo indeed. But this doesn't do it because R.dateorder = 1 doesn't necessarily exist, it might be filtered out by the criterium DC.datafield2 = R.datafield.
littlegreen
True. Then the fix to that is to move that filtering criteria (along with requisite table joins) up into the WITH clause, so that only valid "candidate rows" are selected and ordered, and filter with RowNumber = 1 in the SELECT clause. Very hard to code right without data to test against, so I'm leaving this as a comment.
Philip Kelley
Everybody is asking for data, but what would you suggest as a handy medium to transfer my data?
littlegreen
CREATE TABLE statements (dummy table and column names are fine), and INSERT statements to populate just enough data to provide a valid test. (What you have listed looks fine.)
Philip Kelley
Thanks, I'll keep it in mind for next time. When I have time I'll polish up this one. As I said at the other answer I solved my problem now using a nesting around my original code and choosing row_number = 1 from that, which is similar to what you are suggesting. It works (300000 rows in 9 seconds on our server) but I'm still thinking there might be an even faster way. Can't think of it though.
littlegreen
A: 

Well, I don't use WITH or OVER, so this is a different approach.. I might be over-simplifying something, but without having the data in front of me this is what I came up with:

SELECT distinct a.Key1, a.Key2, a.Datafield, 
       ISNULL(b.Datafield,'') as Datafield1, 
       ISNULL(b.Date,a.Date) as `Date`, 
       MIN(a.DateOrder) as DateOrder
FROM Source a
LEFT JOIN Source b 
     ON a.Key1 = b.Key1
     AND a.Key2 = b.Key2
     AND a.Dateorder <> b.Dateorder
LEFT JOIN AllowedDataCombinationsTable c
     ON a.Datafield = c.Datafield
     AND b.Datafield = c.Datafield1
GROUP BY a.Key1, a.Key2, a.Datafield, ISNULL(b.Datafield,''), ISNULL(b.Date,a.Date)
Fosco
Hi, thanks. This is nice but I think it won't work in my case either, since I have a bunch of other datafields that I need to join as well. I'll test it and get back to you. Meanwhile I solved it using an extra nesting with row_number() around my original code in order to select the highest-ranked joined row per group, which gets the job done, but looks inefficient to me.
littlegreen