views:

49

answers:

3

I am looking to return the following information for each contact contained within a set of tables:

Contact_id    |    first_Event_date    |   first_Event_id    | event_type
id123456      |    01/12/2007          |   eveid123456       | table1
id456455      |    05/06/1999          |   eveid456585       | table4

Where the data reflects the first event that each contact has ever been involved with (which could be contained in any of up to 8 tables), and the event_type tells you from which table the event is from.

I have the following query script as a starting point, and it works fine when trying to pull just the contact_id and event_date but when I try to also include the event_id it seems to arbitrarily pull an ID from somewhere which is not correct:

SELECT
table1.contact_id               AS contact_id
MIN(table1.date_received)       AS event_date
table1.event_id                 AS event_id
FROM table1
GROUP BY table1.contact_id
UNION
SELECT
table2.contact_id               
MIN(table2.date_received)       
table2.event_id                 
FROM table2
GROUP BY table2.contact_id

And this is repeated for tables 3-6. I know that I need to also include the table1.event_id etc in the GROUP BY clause, but when I do it returns all mentions of each event for each contact (for each table), so one contact has multiple rows returned for the table1 subquery when there should be at most 1 row returned.

Additionally, in case in helps not all contacts will appear in all of the tables (but will appear at least once across all of the table) and I'm using sql server 2005.

Thanks in advance :)

A: 

Try merging the tables using UNION ALL to return the first result for each source table, then running an outer query to get the earliest across them all.

In this example, the first step is selected into a temp table, and the second step against that temp table. It would be possible to carry this operation out as a single nested query, but more confusing to read:

step 1 - get the earliest row for each contact from each table

SELECT  contact_id ,
        event_id ,
        date_received
INTO #firstEventsAllTables      
FROM    (   SELECT  contact_id ,
                    event_id ,
                    date_received,
                    ROW_NUMBER() OVER (PARTITION BY contact_id
                                       ORDER BY date_received
                                      ) AS rn
            FROM table1      
        ) AS t1
WHERE rn = 1

UNION ALL       


SELECT  contact_id ,
        event_id ,
        date_received
FROM    (   SELECT  contact_id ,
                    event_id ,
                    date_received,
                    ROW_NUMBER() OVER (PARTITION BY contact_id
                                       ORDER BY date_received
                                      ) AS rn
            FROM table2      
        ) AS t2
WHERE rn = 1

UNION ALL

etc...

step 2 - find the earliest row per contact across all tables

SELECT  contact_id ,
        event_id ,
        date_received
FROM    (   SELECT  contact_id ,
                    event_id ,
                    date_received,
                    ROW_NUMBER() OVER (PARTITION BY contact_id
                                       ORDER BY date_received
                                      ) AS rn
            FROM #firstEventsAllTables       
        ) AS f
WHERE rn = 1

(untested)

Ed Harper
A: 

First, I am surprised that your query runs. Every field in the select clause either needs to exist in the Group By clause or be wrapped in some sort of aggregate function. Your event_id field isn't, so you should be getting an error.

Second, to get the "other fields" associated with the record that contains the minimum, the OVER keyword (added for SQL2005) is the way to go. The following query adds each contact's minimum event date to every row of the result set.

SELECT
  contact_id AS contact_id
  date_received AS event_date, 
  MIN(date_received) OVER (PARTITION BY contact_id) AS min_event_date
  event_id AS event_id
FROM table1

You can't put the OVER bit into the where clause, so you have to wrap it up in a subquery to find the record you want.

SELECT contact_id, event_date, event_id
FROM (
  SELECT
    contact_id AS contact_id
    date_received AS event_date, 
    MIN(date_received) OVER (PARTITION BY contact_id) AS min_event_date
    event_id AS event_id
  FROM table1)
WHERE event_date = min_event_date

Final solution involves a double layer of subquery, I think, with the UNION as the deepest:

SELECT contact_id, event_date, event_id
FROM (
  SELECT
    contact_id
    event_date, 
    MIN(event_date) OVER (PARTITION BY contact_id) AS min_event_date
    event_date
  FROM (
    SELECT
      table1.contact_id AS contact_id
      table1.date_received AS event_date
      table1.event_id AS event_id
    FROM table1
    UNION
    SELECT
      table2.contact_id AS contact_id
      table2.date_received AS event_date
      table2.event_id AS event_id
    FROM table2)
WHERE event_date = min_event_date
Bill
A: 

Ed Harper's method is probably the best one.

Just taking a fun stab in the dark at a crazy version, try this:

WITH all AS (
   SELECT tbl = 'table1', contact_id, date_received, event_id FROM table1
   UNION ALL SELECT 'table2', contact_id, date_received, event_id FROM table2
   UNION ALL SELECT 'table3', contact_id, date_received, event_id FROM table3
   UNION ALL SELECT 'table4', contact_id, date_received, event_id FROM table4
) ranked AS (
   SELECT
      *, flag = row_number() OVER (PARTITION BY contact_id, ORDER BY date_received),
   FROM All
)
SELECT *
FROM ranked
WHERE flag = 1

It looks simple, but it will probably perform badly. Please try it and let us know how it does. :)

Emtucifor