views:

73

answers:

7

Someone please change my title to better reflect what I am trying to ask.

I have a table like Table(id,value,value_type,data)

ID is NOT unique. There is no unique key.

value_type has two types. lets say A and B.

Type B is better than A, but often not available.

For each id if any records with value_type B exsits I want all the records with that id and value_type B.

If no record for that id with value_Type B exists I want all records with that id and value_type A.

Notice that if B exsists for that id I don't want records with type A.

I currently do this with a seris of temp tables. Is there a single select statment (sub quries OK) that can do the job?

Thanks so much!

Additional Details:

SQL 2005

A: 

This uses a union, combining all records of value B with all records that have only A values:

SELECT *
FROM mainTable
WHERE value_type = B
GROUP BY value_type UNION SELECT *
                          FROM mainTable
                          WHERE value_type = A
                               AND id NOT IN(SELECT *
                                             FROM mainTable
                                             WHERE value_type = B);
dvanaria
This is a duplicate of Beth's answer, who was first.
Emtucifor
A: 

Try this (MSSQL).

Select id, value_typeB, null
from myTable
where value_typeB is not null
Union All
Select id, null, value_typeA
from myTable
where value_typeB is null and value_typeA is not null
AllenG
value_typeB and value_typeA are not columns. value_type is a column and it can have value A or B
kralco626
+3  A: 
SELECT * 
  FROM table 
 WHERE value_type = B
UNION ALL
SELECT * 
  FROM table 
 WHERE ID not in (SELECT distinct id 
                    FROM table 
                   WHERE value_type = B)
Beth
I think this query will miss the record id's that are only associated with type A values.
dvanaria
@dvanaria: Re-read the 2nd half of the UNION statement
OMG Ponies
@Beth: +1: Shouldn't need the DISTINCT, for sake of the `IN`
OMG Ponies
wouldn't the distinct improve performance? or does it make it worse?
Beth
testing this now
kralco626
@Beth - I would think it would depend on how often id is duplicated. Not in is very expencive so less records in the sub query is better. but if distinct doesn't remove that many records than maybe it's slower? thats what i would guess.
kralco626
@Beth - I tested with and without the "distinct". It was 13 seconds faster using distinct.
kralco626
woot! I'm sure Pony-boy is right, though, don't strictly need it and could hurt performance
Beth
13 seconds faster is a noticeable difference, wow!
OMG Ponies
It is part of a larger procedure. So it took 2 min 18 seconds rather than 2 min 31 seconds.
kralco626
This query has the most reads of any presented here, because it has to do 2 joins with 3 references to the table. For what it's worth, though NOT IN works, it really should be written as the anti semi join the optimizer was smart enough to switch it to. Also, the DISTINCT is meaningless. The execution plan is the same either way, using an anti semi join which doesn't care about uniqueness).
Emtucifor
@kralco626 One test is not enough, you need to run it multiple times in isolation from other factors like being in the middle of an SP
Emtucifor
+1  A: 
declare @test as table(
 id int , value [nvarchar](255),value_type [nvarchar](255),data  int)

 INSERT INTO @test
 SELECT 1, 'X', 'A',1 UNION
 SELECT 1, 'X', 'A',2 UNION
 SELECT 1, 'X', 'A',3 UNION
 SELECT 1, 'X', 'A',4 UNION
 SELECT 2, 'X', 'A',5 UNION
 SELECT 2, 'X', 'B',6 UNION
 SELECT 2, 'X', 'B',7 UNION
 SELECT 2, 'X', 'A',8 UNION
 SELECT 2, 'X', 'A',9 


 SELECT * FROM @test x
 INNER JOIN 
 (SELECT id, MAX(value_type) as value_type FROM 
 @test GROUP BY id) as y
 ON x.id = y.id AND x.value_type = y.value_type
Christopherous 5000
Your query performs near the best of all the queries submitted (often best) so deserves recognition.
Emtucifor
@Emtucifor - I belive, however, that if you replaced @test for subquery, or non-indexed view, it would take longer because it would have to call it twice. That would be my fault however, not Chris's because i didn't say that in my question
kralco626
@kralco626 - The number of times a table is mentioned in a query does not correlate to performance. Focusing on reducing the number of times the table name is mentioned, without looking at actual performance of the query (CPU and reads) will definitely lead you down the wrong path.
Emtucifor
I know that the number of time a table is used does not nessesariy corrisipond to preformance. But what if "table" was replaced with a query. (as I plan on doing, I know i proly should have mentioned this) Now if you have to call the sub-query 3 times rather than 1... I tried it It takes way longer.
kralco626
+3  A: 

RANK, rather than ROW_NUMBER, because you want ties (those with the same B value) to have the same rank value:

WITH summary AS (
  SELECT t.*,
         RANK() OVER (PARTITION BY t.id 
                          ORDER BY t.value_type DESC) AS rank
    FROM TABLE t
   WHERE t.value_type IN ('A', 'B'))
SELECT s.id,
       s.value,
       s.value_type,
       s.data
  FROM summary s
 WHERE s.rank = 1

Non CTE version:

SELECT s.id,
       s.value,
       s.value_type,
       s.data
  FROM (SELECT t.*,
               RANK() OVER (PARTITION BY t.id 
                                ORDER BY t.value_type DESC) AS rank
          FROM TABLE t
         WHERE t.value_type IN ('A', 'B')) s
 WHERE s.rank = 1

WITH test AS (
   SELECT 1 AS id, 'B' AS value_type
   UNION ALL
   SELECT 1, 'B'
   UNION ALL
   SELECT 1, 'A'
   UNION ALL
   SELECT 2, 'A'
   UNION ALL
   SELECT 2, 'A'),
     summary AS (
   SELECT t.*,
          RANK() OVER (PARTITION BY t.id 
                           ORDER BY t.value_type DESC) AS rank
     FROM test t)
SELECT *
  FROM summary
 WHERE rank = 1

I get:

id   value_type  rank
----------------------
1    B           1
1    B           1
2    A           1
2    A           1
OMG Ponies
+1 Beat me by 12 seconds!
Martin Smith
@Martin Smith: So you admit it - you *stole* my answer! =)
OMG Ponies
haha! ya i was doing almost exactly that with ROW_NUMBER. Just when I thought I was getting the hang of using these "window" functions.Thanks! i'll test and let you know.
kralco626
Your solution actually doesn't work. And i'm not sure why. I get an additional 22 records. Although it was a few seconds faster than Beth's solution.
kralco626
@kralco626: Are there `value_type`'s other than A and B? See update
OMG Ponies
no - and i just ran a distinct query to confirm.
kralco626
I would like to use your solution if you can make it work, because it only uses TABLE once. This is excepcially benificial because that table is a temp table and I would like to replace it with the query that populates that temp table. In your case the query would run once, in Beth's case it would have to be called 3 times.
kralco626
@kralco626: Here's a [query on the Data Explorer](http://odata.stackexchange.com/stackoverflow/q/9653/rank) using basic data, I get the correct results.
OMG Ponies
@OMG - i know it does seem odd. It should work. I just get some extra rows when I use it, not sure why. It is part of a larger query, I can post the whole thing if you would like me to.
kralco626
OMG Ponies
No- I just ran it. Just running your query and just running Beths you get around 20 more records.
kralco626
Your result set is actually disjoint from Beths. I ran Beth's query except your query and I got 287659 rows.
kralco626
OH I THINK I GOT IT. I'm not using A and B for my codes! that was just for example purposes. In my specific case I dont wanna sort DESC! Sorry for the confusing and thanks for the answer!
kralco626
@kralco626: What's the value_type data type - numeric?
OMG Ponies
This method, when done on a heap with about 300000 rows, takes up to 1000 times the number of reads as other solutions presented. Indexes or particular patterns of A/B appearances could change things, but not by enough, I think, to fix that.
Emtucifor
@Omg - No it's a varchar
kralco626
@Emtucifor - Which other solution is the most efficient? I tried Beths and this solution beat it for time. Also, "table" is a temp table and I want to replace it with a subquery, so a solution that only uses it once is ideal.
kralco626
A: 

Perhaps something like this:

select * from mytable
where id in (select distinct id where value_type = "B")
union
select * from mytable
where id in (select distinct id where value_type = "A" 
and id not in (select distinct id where value_type = "B"))
LymanZerga
+1  A: 

The shortest query to do the job I can think of:

SELECT TOP 1 WITH TIES *
FROM #test
ORDER BY Rank() OVER (PARTITION BY id ORDER BY value_type DESC)

This is about 50% worse on CPU as OMG Ponies' and Christoperous 5000's solutions, but the same number of reads. It's the extra sort that is making it take more CPU.

The best-performing original query I've come up with so far is:

SELECT * 
FROM #test 
WHERE value_type = 'B'
UNION ALL
SELECT * 
FROM #test T1
   WHERE NOT EXISTS (
   SELECT 1
   FROM #test T2
   WHERE
      T1.id = T2.id
      AND T2.value_type = 'B'
)

This consistently beats all the others presented on CPU by about 1/3rd (the others are about 50% more) but has 3x the number of reads. The duration on this query is often 2/3rds the time of all the others. I consider it a good contender.

Indexes and data types could change everything.

Emtucifor
Thanks for doing the analysis, nice job. However, I need a solution that uses the table only once.
kralco626
also i think your solutions would only return one record. What if there are more than one record with the highest priority type?
kralco626
Try the query out :)
Emtucifor