views:

663

answers:

5

I have a view which was working fine when I was joining my main table:

LEFT OUTER JOIN OFFICE ON CLIENT.CASE_OFFICE = OFFICE.TABLE_CODE.

However I needed to add the following join:

LEFT OUTER JOIN OFFICE_MIS ON CLIENT.REFERRAL_OFFICE = OFFICE_MIS.TABLE_CODE

Although I added DISTINCT, I still get a "duplicate" row. I say "duplicate" because the second row has a different value.

However, if I change the LEFT OUTER to an INNER JOIN, I lose all the rows for the clients who have these "duplicate" rows.

What am I doing wrong? How can I remove these "duplicate" rows from my view?


Note:

This question is not applicable in this instance:

http://stackoverflow.com/questions/18932/sql-how-can-i-remove-duplicate-rows

+3  A: 

If the secondd row has one different value than it is not really duplicate and should be included.

Hunter
No, not really. There is only 1 row on the main table and I need only 1 row row in my view, even if my data is wrong. 2 rows are causing havoc in my reports.
Greg
+1  A: 

You could try using Distinct Top 1 but as Hunter pointed out, if there is if even one column is different then it should either be included or if you don't care about or need the column you should probably remove it. Any other suggestions would probably require more specific info.

EDIT: When using Distinct Top 1 you need to have an appropriate group by statement. You would really be using the Top 1 part. The Distinct is in there because if there is a tie for Top 1 you'll get an error without having some way to avoid a tie. The two most common ways I've seen are adding Distinct to Top 1 or you could add a column to the query that is unique so that sql would have a way to choose which record to pick in what would otherwise be a tie.

Bryan
I have used distinct. I cannot use Distinct Top 1 - I have 500 000 records and need to report on them all.
Greg
Thank you very much Rich B. I will use your suggestion.
Greg
Thank you Bryan. I will apply your suggestion.
Greg
+1  A: 

Instead of using DISTINCT, you could use a GROUP BY.

  • Group by all the fields that you want to be returned as unique values.
  • Use MIN/MAX/AVG or any other function to give you one result for fields that could return multiple values.

Example:

SELECT Office.Field1, Client.Field1, MIN(Office.Field1), MIN(Client.Field2)  
FROM YourQuery  
GROUP BY Office.Field1, Client.Field1
Lieven
+6  A: 

DISTINCT won't help you if the rows have any columns that are different. Obviously, one of the tables you are joining to has multiple rows for a single row in another table. To get one row back, you have to eliminate the other multiple rows in the table you are joining to.

The easiest way to do this is to enhance your where clause or JOIN restriction to only join to the single record you would like. Usually this requires determining a rule which will always select the 'correct' entry from the other table.

Let us assume you have a simple problem such as this:

Person:  Jane
Pets: Cat, Dog

If you create a simple join here, you would receive two records for Jane:

Jane|Cat
Jane|Dog

This is completely correct if the point of your view is to list all of the combinations of people and pets. However, if your view was instead supposed to list people with pets, or list people and display one of their pets, you hit the problem you have now. For this, you need a rule.

SELECT Person.Name, Pets.Name
FROM Person
  LEFT JOIN Pets pets1 ON pets1.PersonID = Person.ID
WHERE 0 = (SELECT COUNT(pets2.ID) 
             FROM Pets pets2
             WHERE pets2.PersonID = pets1.PersonID
                AND pets2.ID < pets1.ID);

What this does is apply a rule to restrict the Pets record in the join to to the Pet with the lowest ID (first in the Pets table). The WHERE clause essentially says "where there are no pets belonging to the same person with a lower ID value).

This would yield a one record result:

Jane|Cat

The rule you'll need to apply to your view will depend on the data in the columns you have, and which of the 'multiple' records should be displayed in the column. However, that will wind up hiding some data, which may not be what you want. For example, the above rule hides the fact that Jane has a Dog. It makes it appear as if Jane only has a Cat, when this is not correct.

You may need to rethink the contents of your view, and what you are trying to accomplish with your view, if you are starting to filter out valid data.

Jay S
thanks a million Jay!
Greg
that's an interesting way to do it. Is that faster than the select top 1 sub-query in my answer?
dotjoe
I'm not sure if it's faster than the select top, but I wanted to show a generic example of applying a rule. The above select could have also had a completely different WHERE clause, or perhaps a restriction on the JOIN to say "where Pets.Name = 'Cat'". The concept is that a rule is needed, and it has to be specific to the view being developed and the context of the data that needs to be presented.
Jay S
+3  A: 

So you added a left outer join that is matching two rows? OFFICE_MIS.TABLE_CODE is not unique in that table I presume? you need to restrict that join to only grab one row. It depends on which row you are looking for, but you can do something like this...

LEFT OUTER JOIN OFFICE_MIS ON 
  OFFICE_MIS.ID = /* whatever the primary key is? */
    (select top 1 om2.ID
    from OFFICE_MIS om2
    where CLIENT.REFERRAL_OFFICE = om2.TABLE_CODE
    order by om2.ID /* change the order to fit your needs */)
dotjoe
Thank you dotjoe
Greg