ansaurus

Question

SQL group by clause understanding (simple question / SQL server 2005)

Answer 1

A:

You can select the Min ID for both the event and cash, which should be the oldest of each item, assuming there aren't any odd variations to the insert and linking logic for the entries.

The other alternative is adding clauses to the joins to restrict it to the first or MIN ID, then you won't need the grouping.

Benjamin Anderson 2010-08-16 15:00:17

The problem with this is that (apologies for not being more specific), the cash.id2 is the text reference for the actual cash value and so applying a MIN() to it returns an error

Davin 2010-08-16 15:13:46

Is the ID still a purely numeric value? If it is, the conversion to an int wouldn't add that much overhead in the scheme of things, and then you can select the MIN value.Not the most ideal situation, but it would still work. Doing the conversion outside of the join though, meaning for the aggregation functions, would be a lot less expensive than doing the conversion within the join. So, do the conversion on the ID inside the selection. MIN(CONVERT(INT, cash.id2))

Benjamin Anderson 2010-08-16 15:26:36

No, all of the ID fields are nvarchar(s) containing a mixture of numerics and letters and so are presumable unable to effectively be converted into integers for ranking or MIN / MAX functions?

Davin 2010-08-16 15:31:35

Then you'll have to go with the solution Beth provided. Grouping won't work for what you're trying to do.

Benjamin Anderson 2010-08-16 15:36:45

Answer 2

A:

You shouldn't need to add them into the group-by clause, but you do need to aggregate them somehow. The point of the group-by clause is that you want a grouping of statistics. That is, you want everything for that same contact id, not just the first record.

I believe if you do something like

SELECT Contact.Id,
  MIN(eve.date_created),
  MAX(eve.date_created), //this will get you the range of events
  SUM(cash.id2), //total cash for all events
  MIN(eve.Id), //First event id
  MAX(eve.Id) //Latest event id
FROM _contact Contact //Etc. etc.

your query will work for grouping. What it seems like you want, however, is more like:

[Select your columns]
FROM _contact Contact 
INNER JOIN _Table2 tb2 on contact.id = tb2.id1 
  and tb2.id1 = (Select MIN(id2) from tb2 where tb2.id1 = contact.id)
INNER JOIN _event eve on tb2.Id2 = eve.id
INNER JOIN _cashtable cahs on cash.contact_id = contact.id
  and cash.id = (select MIN(id) from cash where cash.contact_id = contact.id)

YMMV, and I your table structure will partially dictate how fast that runs (that is, you may want to look at some optimizations). Also, I did that completely from memory, so you may need to play with those INNER JOINs to get them to work. The point here is that you're not trying to grab the group, your trying to grab the first one for each contact.id.

AllenG 2010-08-16 15:02:28

Thanks for this, unfortunately the cash.id2 field is a text field and so applying a MIN() constraint throws up an error. For this table I will need to pull the cash.id2 which is directly related to the MIN(eve.date_created) value - is there a way of simply retrieving the aasociated value without having to add it into the group clause? Thanks :)

Davin 2010-08-16 15:16:50

Answer 3

+2 A:

I think you need a subquery joining the event table with an aggregation of itself returning only the min(event dates)

SELECT
contact.id,min_created,cash.id2    -- the cash linked to the first event
,eve.id      -- the first event linked to the contact    
FROM _contact contact 
INNER JOIN _table2 tb2  ON contact.id = tb2.id1
INNER JOIN _event eve  ON tb2.id2 = eve.id
inner join (select id, date_created as min_created from _event group by id) eve_min
on eve_min.id = eve.id and eve_min.min_created = eve.date_created 
INNER JOIN _cashtable cash  ON cash.contact_id = contact.id

Beth 2010-08-16 15:31:14

Thanks for this, I'm assuming that the inner join (select id, date_created is actually select MIN(date_created)? WHen using this it returns many rows per contact_id (around 50) instead of just one, is there any reason for this? (in case it helps using a select distinct halves the number of rows returned to around 25 per contact)

Davin 2010-08-17 08:59:14

yes, you're right, I should have applied the min() function to date_created.If it's returning more than one row, then it's getting the remaining rows from event, if there's more than one for the same min_created, or cash, if there's more than one row per ID.

Beth 2010-08-17 14:57:17

ansaurus

tags:

views:

answers:

SQL group by clause understanding (simple question / SQL server 2005)

related questions