views:

1552

answers:

3

Hi all,

Could someone double check my SQL statement for proper operation and general commonsense approach?

Here's what's going on: I have a parent and child table with a one-to-many relationship, joined on a column named AccountNumberKey. The child table has numeric columns in it I need to sum up.

The data is such that all child records with a given AccountNumberKey value will always have the same values in their two numeric columns. I want to join these tables and for each AccountNumberKey put the sum of those two columns into a temp table. I only need to sum the columns from a single child record given each AccountNumberKey.

Some sample data below will (I hope) make this clearer:

Parent Table Columns


ParentID    InstitutionID    AccountNumberKey  

1       LocalHost            1873283  
2       Acme Brokers         3627389    
3       Dewey, Cheatem       1392876
4       NCC1701              8837273
5       Peyton Place         9981273


Child Table Columns


ChildID     AccountNumberKey    Value1      Value2       ProposalNumber
1               1873283         1000        100         58
2               1873283         1000        100         59
3               1873283         1000        100         60
4               1873283         1000        100         61

Here's my SQL Statement:

    SELECT   DISTINCT  Parent.InstitutionID, AccountNumberKey, SUM(Child.Value1 + Child.Value2) as total
        INTO   #TempTable
        FROM         Parent 
            INNER JOIN
              Child ON Parent.AccountNumberKey = Child.AccountNumberKey 

        GROUP BY Parent.InstitutionID, Parent.AccountNumberKey, Child.ProposalNumber

The goal is to link the tables and put data into a temp table so it looks like so:

TempTable columns


InstitutionID       AccountNumberKey        Total
LocalHost           1873283             1100

Does my SQL query pass muster? I'm no genius when it comes to groupings and wondered if this is A) correct and B) an ok way to go or if there are better joins to try.

Thanks!

+1  A: 

First of all, if it is true thjat "child records with a given AccountNumberKey value will always have the same values in their two numeric columns", then your table schema is not in proper third normal form, (3NF). There should be another table with one row per AccountNumberKey, with AccountNumberKey as key, and Value1 and Value2 as data fields, and your queries should be joining to this table (using AccountNumberKey), to retrieve Value1 and Value2.

Secondly, in this situation, you should not join a child table to a parent table on a parent table column that is not a Key. This will cause a cartesian product (where the ouput will include multiple rows for each row in both sides of the join, in efect double or triple counting those records...) Is AccountNumberKey a key for the Parent table?

If it isn't, then the only column in the parent table that should be used as a FK in the child table is the key column ParentID.

If it is (if AccountNumberKey is unique in Parent Table), then the Value1 and Value2 columns shhould be in the Parent Table, not in the child table.

Charles Bretana
Charles, you can join on any column you want, regardless of key status. Though maybe you were saying that it's not advisable? The server allows you to join on anything you like.
Emtucifor
@Emtucifor, Yes, you are right I will edit to make this point clear.. joining on other than PKs results in cartesian products... but sometimes (not here) that's what you want...
Charles Bretana
@Charles, you can join any table to any table in any way you like and generally i don't go past 2NF without good reason, it's rarely worth it.
Paul Creasey
@Charles, joining on something other than a PK does not yield a cartesian product. Joining without a condition yields a cartesian product no matter whether the column is a PK or not.
Paul Creasey
@Paul Creasey: that you "rarely" go past 2NF means I hope I never work with you professionally.
Emtucifor
@Charles Bretana: A column can be a candidate/business key without that being reflected in the database. Though such keys *ought* to be formalized with unique constraints or unique indexes, that isn't a requirement.
Emtucifor
@Emtucifor, Yes you are right, and whether a key is formalized or not is not at all my point. The formalization (the Unique Index or Constraint) just prevents inserts/updates that would generate inconsistent data. I am speaking logically, not physically, what I mean by "a key" is a column (or columns) that "logically" represents a unique attribute(s), whether or not it has been formalized in the physical model.
Charles Bretana
Hi everyone- I really appreciate the discussion here and all your thoughts and opinions. To answer a question that's lingering out there: Unfortunately, yes, there isn't a formal FK relationship on the AccountNumberKey column. It's a more..."casual" relationship, let's call it. I know this can lead to data inconsistencies but we have to live with this and the bosses know the risks. As to value1 and value2 perhaps needing to belong to the parent table, it's possible down the road they'll be different for each ProposalNumber/AccountNumberKey combination.
larryq
@Paul, to be technical, joining always results in a cartesian product (m x n). We normally only call it that when both sides of the product > 1). The number of output rows is always equal to the number of rows satisfied by the join condition on one sde, times the number of rows satisfied by the join condition on the other side... So if both sides have multiple rows (i.e., if neither side is a unique key column attribute) and you are doing aggregates on what you `assumed` was the only many side, you will get erroneous invalid results because the values will be included multiple times.
Charles Bretana
+1  A: 

Your join will not yield those results, since AccountNumberKey is not unique the join would produce

LocalHost           1873283             1100

4 time times, one for each child record, when aggregated that would yield 4400 as the value.

Paul Creasey
Actually, that's not true, as he's grouping by Child.ProposalNumber. But he won't get one row for each row in the parent table.
Emtucifor
+1  A: 

This query will give the results you seem to want:

SELECT
   P.InstitutionID,
   P.AccountNumberKey,
   Total = C.Value1 + C.Value2
FROM
   Parent P
   INNER JOIN (
      SELECT DISTINCT AccountNumberKey, Value1, Value2
      FROM Child
   ) C ON P.AccountNumberKey = C.AccountNumberKey

But I want to echo what others have said: if you can do anything about the design, you should, because it is not normalized. Value1 and Value2 from your Child table really belong in the Parent table, as they are about the Parent. What if two rows in the Child table have a different set of Values for the same AccountNumberKey? Your data would be all wrong and who knows what sort of possibly disastrous consequences that could have for the business? The DISTINCT above would fail in this case and return two rows for the Parent row.

UPDATE:

larryq said:

As to value1 and value2 perhaps needing to belong to the parent table, it's possible down the road they'll be different for each ProposalNumber/AccountNumberKey combination.

In this case, then the query I've given you would give strange results. How will you decide which set of values to use for an AccountNumber? Will you always want the most recent ProposalNumber? Will you want to see a row for each distinct set of Value1 and Value2? Is there another table to join to to find the current ProposalNumber to use?

Emtucifor