tags:

views:

23

answers:

2

Hello,

I am trying to figure out the SQL for doing some relatively simple operations on sets of records in a table but I am stuck. Consider a table with multiple rows per item, all identified by a common key.

For example:

serial model color
XX1   A     blue
XX2   A     blue
XX3   A     green
XX5   B     red
XX6   B     blue
XX1   B     blue

What I would for example want to do is:

  1. Assuming that all model A rows must have the same color, find the rows which dont. (for example, XX3 is green).

  2. Assuming that a given serial number can only point to a single type of model, find out the rows which that does not occur (for example XX1 points both to A and B)

These are all simple logically things to do. To abstract it, I want to know how to group things by using a single key (or combination of keys) and then compare the values of those records.

Should I use a join on the same table? should i use some sort of array or similar?

thanks for your help

A: 

To address #1, I would use a self-join (a join on the same table, as you put it).

For example,

select * 
from mytable
where serial in (select serial 
                 from mytable 
                 group by model, color 
                 having count(*) = 1)

would find all the serial numbers that only exist in one color. I did not test this, but I hope you see what it does. The inner select finds all the records that only occur once, then the outer select shows all detail for those serials.

Of course, having said that, this is a poor table design. But I don't think that was your question. And I hope this was a made up example for a real situation. My concern would be that there is no reason to assume that the single occurrence is actually bad -- it could be that there are 10 records, all of which have a distinct color. This approach would tell you that all of them are wrong, and you would be unable to decide which was correct.

MJB
+3  A: 

For 1:

SELECT model, color, COUNT(*) AS num FROM yourTable GROUP BY model, color;

This will give you a list of each model and each color for that model along with the count. So the output from your dataset would be:

model color num
A     blue  2
A     green 1
B     red   1
B     blue  2

From this output you can easily see what's incorrect and fix it using an UPDATE statement or do a blanket operation where you assign the most popular color to each model.

For 2:

SELECT serial, COUNT(*) AS num FROM yourTable GROUP BY serial HAVING num > 1

The output for this would be:

serial num
XX1    2
Ben S