views:

60

answers:

3

HI all, I'm having some difficulty understanding the rationale behind group by aggregation in sql server 2005.

I have the following query which works fine and returns one row for each contact.id and the 1st occurence of event

SELECT
contact.id
,MIN(eve.date_created)
FROM _contact contact WITH(nolock)
INNER JOIN table2 tb2 WITH (nolock) ON contact.id = tb2.id1
INNER JOIN _event eve WITH (nolock) ON tb2.id2 = eve.id
INNER JOIN _cashtable cash WITH (nolock) ON cash.contact_id = contact.id
GROUP BY contact.id

However what I'm looking for is to have the following query and still only pull out one row per contact.id

SELECT
contact.id
,MIN(eve.date_created)
,cash.id2    -- the cash linked to the first event
,eve.id      -- the first event linked to the contact    
FROM _contact contact with (nolock)
INNER JOIN _table2 tb2 WITH (nolock) ON contact.id = tb2.id1
INNER JOIN _event eve WITH (nolock) ON tb2.id2 = eve.id
INNER JOIN _cashtable cash WITH (nolock) ON cash.contact_id = contact.id
GROUP BY contact.id

I get the standard error message saying that I need to add in cash.id2 and eve.id into the group by clause, which returns results I don't want.

I know that there are the potential options of using rank() / partitioning or even including the

select(MIN(eve.date_created) 

in the FROM clause, but I am not sure which would be best to put this and am still quite confused as to why SQL requires everything to be included in the group by statement and so any advice would be great :)

Thanks in advance!

A: 

You can select the Min ID for both the event and cash, which should be the oldest of each item, assuming there aren't any odd variations to the insert and linking logic for the entries.

The other alternative is adding clauses to the joins to restrict it to the first or MIN ID, then you won't need the grouping.

Benjamin Anderson
The problem with this is that (apologies for not being more specific), the cash.id2 is the text reference for the actual cash value and so applying a MIN() to it returns an error
Davin
Is the ID still a purely numeric value? If it is, the conversion to an int wouldn't add that much overhead in the scheme of things, and then you can select the MIN value.Not the most ideal situation, but it would still work. Doing the conversion outside of the join though, meaning for the aggregation functions, would be a lot less expensive than doing the conversion within the join. So, do the conversion on the ID inside the selection. MIN(CONVERT(INT, cash.id2))
Benjamin Anderson
No, all of the ID fields are nvarchar(s) containing a mixture of numerics and letters and so are presumable unable to effectively be converted into integers for ranking or MIN / MAX functions?
Davin
Then you'll have to go with the solution Beth provided. Grouping won't work for what you're trying to do.
Benjamin Anderson
A: 

You shouldn't need to add them into the group-by clause, but you do need to aggregate them somehow. The point of the group-by clause is that you want a grouping of statistics. That is, you want everything for that same contact id, not just the first record.

I believe if you do something like

SELECT Contact.Id,
  MIN(eve.date_created),
  MAX(eve.date_created), //this will get you the range of events
  SUM(cash.id2), //total cash for all events
  MIN(eve.Id), //First event id
  MAX(eve.Id) //Latest event id
FROM _contact Contact //Etc. etc.

your query will work for grouping. What it seems like you want, however, is more like:

[Select your columns]
FROM _contact Contact 
INNER JOIN _Table2 tb2 on contact.id = tb2.id1 
  and tb2.id1 = (Select MIN(id2) from tb2 where tb2.id1 = contact.id)
INNER JOIN _event eve on tb2.Id2 = eve.id
INNER JOIN _cashtable cahs on cash.contact_id = contact.id
  and cash.id = (select MIN(id) from cash where cash.contact_id = contact.id)

YMMV, and I your table structure will partially dictate how fast that runs (that is, you may want to look at some optimizations). Also, I did that completely from memory, so you may need to play with those INNER JOINs to get them to work. The point here is that you're not trying to grab the group, your trying to grab the first one for each contact.id.

AllenG
Thanks for this, unfortunately the cash.id2 field is a text field and so applying a MIN() constraint throws up an error. For this table I will need to pull the cash.id2 which is directly related to the MIN(eve.date_created) value - is there a way of simply retrieving the aasociated value without having to add it into the group clause? Thanks :)
Davin
+2  A: 

I think you need a subquery joining the event table with an aggregation of itself returning only the min(event dates)

SELECT
contact.id,min_created,cash.id2    -- the cash linked to the first event
,eve.id      -- the first event linked to the contact    
FROM _contact contact 
INNER JOIN _table2 tb2  ON contact.id = tb2.id1
INNER JOIN _event eve  ON tb2.id2 = eve.id
inner join (select id, date_created as min_created from _event group by id) eve_min
on eve_min.id = eve.id and eve_min.min_created = eve.date_created 
INNER JOIN _cashtable cash  ON cash.contact_id = contact.id
Beth
Thanks for this, I'm assuming that the inner join (select id, date_created is actually select MIN(date_created)? WHen using this it returns many rows per contact_id (around 50) instead of just one, is there any reason for this? (in case it helps using a select distinct halves the number of rows returned to around 25 per contact)
Davin
yes, you're right, I should have applied the min() function to date_created.If it's returning more than one row, then it's getting the remaining rows from event, if there's more than one for the same min_created, or cash, if there's more than one row per ID.
Beth