views:

363

answers:

4

I have a database table structured like this (irrelevant fields omitted for brevity):

rankings
------------------
(PK) indicator_id
(PK) alternative_id
(PK) analysis_id
rank

All fields are integers; the first three (labeled "(PK)") are a composite primary key. A given "analysis" has multiple "alternatives", each of which will have a "rank" for each of many "indicators".

I'm looking for an efficient way to compare an arbitrary number of analyses whose ranks for any alternative/indicator combination differ. So, for example, if we have this data:

analysis_id | alternative_id | indicator_id | rank
----------------------------------------------------
          1 |              1 |            1 |    4
          1 |              1 |            2 |    6
          1 |              2 |            1 |    3
          1 |              2 |            2 |    9
          2 |              1 |            1 |    4
          2 |              1 |            2 |    7
          2 |              2 |            1 |    4
          2 |              2 |            2 |    9

...then the ideal method would identify the following differences:

analysis_id | alternative_id | indicator_id | rank
----------------------------------------------------
          1 |              1 |            2 |    6
          2 |              1 |            2 |    7
          1 |              2 |            1 |    3
          2 |              2 |            1 |    4

I came up with a query that does what I want for 2 analysis IDs, but I'm having trouble generalizing it to find differences between an arbitrary number of analysis IDs (i.e. the user might want to compare 2, or 5, or 9, or whatever, and find any rows where at least one analysis differs from any of the others). My query is:

declare @analysisId1 int, @analysisId2 int;
select @analysisId1 = 1, @analysisId2 = 2;

select 
 r1.indicator_id, 
 r1.alternative_id,
 r1.[rank] as Analysis1Rank,
 r2.[rank] as Analysis2Rank
from rankings r1
inner join rankings r2
 on r1.indicator_id = r2.indicator_id
  and r1.alternative_id = r2.alternative_id
  and r2.analysis_id = @analysisId2
where
 r1.analysis_id = @analysisId1
 and r1.[rank] != r2.[rank]

(It puts the analysis values into additional fields instead of rows. I think either way would work.)

How can I generalize this query to handle many analysis ids? (Or, alternatively, come up with a different, better query to do the job?) I'm using SQL Server 2005, in case it matters.

If necessary, I can always pull all the data out of the table and look for differences in code, but a SQL solution would be preferable since often I'll only care about a few rows out of thousands and there's no point in transferring them all if I can avoid it. (However, if you have a compelling reason not to do this in SQL, say so--I'd consider that a good answer too!)

+1  A: 

I don't know wich database you are using, in SQL Server I would go like this:

-- STEP 1, create temporary table with all the alternative_id , indicator_id combinations with more than one rank:
select alternative_id , indicator_id
into #results
from rankings 
group by alternative_id , indicator_id
having count (distinct rank)>1

-- STEP 2, retreive the data

select a.* from rankings a, #results b
where a.alternative_id  = b.alternative_id
and  a.indicator_id = b. indicator_id
order by alternative_id , indicator_id, analysis_id

BTW, THe other answers given here need the count(distinct rank) !!!!!

tekBlues
This is just what I was asking for - thanks! I have to give the nod to Dan for pulling it off without a temp table, though. ;-)
Matt Winckler
ahhh, I love temp tables !!! sorry about it :-)
tekBlues
A: 

I think this is what you're trying to do:

select 
    r.analysis_id, 
    r.alternative_id, 
    rm.indicator_id_max,
    rm.rank_max
from rankings rm
    join (
        select 
            analysis_id, 
            alternative_id, 
            max(indicator_id) as indicator_id_max, 
            max(rank) as rank_max 
        from rankings 
        group by analysis_id, 
            alternative_id 
        having count(*) > 1
    ) as rm
    on r.analysis_id = rm.analysis_id
    and r.alternative_id = rm.alternative_id
+1  A: 

This will return your desired data set - Now you just need a way to pass the required analysis ids to the query. Or potentially just filter this data inside your application.

    select r.* from rankings r
    inner join
    (
        select alternative_id, indicator_id
        from rankings
        group by alternative_id, indicator_id
        having count(distinct rank) > 1
    ) differ on r.alternative_id = differ.alternative_id
    and r.indicator_id = differ.indicator_id
    order by r.alternative_id, r.indicator_id, r.analysis_id, r.rank
Dan Fuller
Perfect - and bonus points for doing it without a temp table! Thanks!
Matt Winckler
A: 

You example differences seems wrong. You say you want analyses whose ranks for any alternative/indicator combination differ but the example rows 3 and 4 don't satisfy this criteria. A correct result according to your requirement is:

 analysis_id | alternative_id | indicator_id | rank
 ----------------------------------------------------
      1 |              1 |            2 |    6
      2 |              1 |            2 |    7
      1 |              2 |            1 |    3
      2 |              2 |            1 |    4

On query you could try is this:

with distinct_ranks as (
    select alternative_id  
    , indicator_id
    , rank
    , count (*) as count
    from rankings
     group by alternative_id  
     , indicator_id
     , rank
    having count(*) = 1)
select r.analysis_id
    , r.alternative_id  
    , r.indicator_id
    , r.rank
from rankings r
    join distinct_ranks d on r.alternative_id = d.alternative_id
     and r.indicator_id = d.indicator_id
     and r.rank = d.rank

You have to realize that on multiple analysis the criteria you have is ambiguous. What if analysis 1,2 and 3 have rank 1 and 4,5 and 6 have rank 2 for alternative/indicator 1/1? The set (1,2,3) is 'different' from the set (4,5,6) but inside each set there is no difference. what is the behavior you desire in that case, should they show up or not? My query finds all records that have a different rank for the same alternative/indicator *from all other analysis' but is not clear if this is correct in your requirement.

Remus Rusanu
You're right, the example was flawed - my typo, now fixed. Thanks for catching it.The desired behavior is to pass any combination of analysis ids to this query and have it return the rows where ranks differ. So in your example, if I passed in the set (1,2,3), no rows would be returned. Likewise if I passed (4,5,6). But if I passed (1,4), then it would return the differing ranks.Dan's and tekBlues' answers reflect my desired behavior, after some slight modification to account for passing in particular analysis IDs.
Matt Winckler
What about (1,2,4,5), should all be returned?
Remus Rusanu
Yes, that's right.
Matt Winckler