ansaurus

Question

Answer 1

A:

This uses a union, combining all records of value B with all records that have only A values:

SELECT *
FROM mainTable
WHERE value_type = B
GROUP BY value_type UNION SELECT *
                          FROM mainTable
                          WHERE value_type = A
                               AND id NOT IN(SELECT *
                                             FROM mainTable
                                             WHERE value_type = B);

dvanaria 2010-08-20 20:13:52

This is a duplicate of Beth's answer, who was first.

Emtucifor 2010-08-23 15:56:22

Answer 2

A:

Try this (MSSQL).

Select id, value_typeB, null
from myTable
where value_typeB is not null
Union All
Select id, null, value_typeA
from myTable
where value_typeB is null and value_typeA is not null

AllenG 2010-08-20 20:14:18

value_typeB and value_typeA are not columns. value_type is a column and it can have value A or B

kralco626 2010-08-20 20:19:14

Answer 3

+3 A:

SELECT * 
  FROM table 
 WHERE value_type = B
UNION ALL
SELECT * 
  FROM table 
 WHERE ID not in (SELECT distinct id 
                    FROM table 
                   WHERE value_type = B)

Beth 2010-08-20 20:17:11

I think this query will miss the record id's that are only associated with type A values.

dvanaria 2010-08-20 20:20:03

@dvanaria: Re-read the 2nd half of the UNION statement

OMG Ponies 2010-08-20 20:22:19

@Beth: +1: Shouldn't need the DISTINCT, for sake of the `IN`

OMG Ponies 2010-08-20 20:23:36

wouldn't the distinct improve performance? or does it make it worse?

Beth 2010-08-20 20:29:11

testing this now

kralco626 2010-08-20 20:29:43

@Beth - I would think it would depend on how often id is duplicated. Not in is very expencive so less records in the sub query is better. but if distinct doesn't remove that many records than maybe it's slower? thats what i would guess.

kralco626 2010-08-20 20:30:47

@Beth - I tested with and without the "distinct". It was 13 seconds faster using distinct.

kralco626 2010-08-20 20:45:54

woot! I'm sure Pony-boy is right, though, don't strictly need it and could hurt performance

Beth 2010-08-20 20:48:35

13 seconds faster is a noticeable difference, wow!

OMG Ponies 2010-08-20 20:52:34

It is part of a larger procedure. So it took 2 min 18 seconds rather than 2 min 31 seconds.

kralco626 2010-08-20 21:02:41

This query has the most reads of any presented here, because it has to do 2 joins with 3 references to the table. For what it's worth, though NOT IN works, it really should be written as the anti semi join the optimizer was smart enough to switch it to. Also, the DISTINCT is meaningless. The execution plan is the same either way, using an anti semi join which doesn't care about uniqueness).

Emtucifor 2010-08-21 03:21:41

@kralco626 One test is not enough, you need to run it multiple times in isolation from other factors like being in the middle of an SP

Emtucifor 2010-08-21 03:26:11

Answer 4

+1 A:

declare @test as table(
 id int , value [nvarchar](255),value_type [nvarchar](255),data  int)

 INSERT INTO @test
 SELECT 1, 'X', 'A',1 UNION
 SELECT 1, 'X', 'A',2 UNION
 SELECT 1, 'X', 'A',3 UNION
 SELECT 1, 'X', 'A',4 UNION
 SELECT 2, 'X', 'A',5 UNION
 SELECT 2, 'X', 'B',6 UNION
 SELECT 2, 'X', 'B',7 UNION
 SELECT 2, 'X', 'A',8 UNION
 SELECT 2, 'X', 'A',9 


 SELECT * FROM @test x
 INNER JOIN 
 (SELECT id, MAX(value_type) as value_type FROM 
 @test GROUP BY id) as y
 ON x.id = y.id AND x.value_type = y.value_type

Christopherous 5000 2010-08-20 20:18:18

Your query performs near the best of all the queries submitted (often best) so deserves recognition.

Emtucifor 2010-08-21 03:30:21

@Emtucifor - I belive, however, that if you replaced @test for subquery, or non-indexed view, it would take longer because it would have to call it twice. That would be my fault however, not Chris's because i didn't say that in my question

kralco626 2010-08-23 10:36:28

@kralco626 - The number of times a table is mentioned in a query does not correlate to performance. Focusing on reducing the number of times the table name is mentioned, without looking at actual performance of the query (CPU and reads) will definitely lead you down the wrong path.

Emtucifor 2010-08-23 15:55:46

I know that the number of time a table is used does not nessesariy corrisipond to preformance. But what if "table" was replaced with a query. (as I plan on doing, I know i proly should have mentioned this) Now if you have to call the sub-query 3 times rather than 1... I tried it It takes way longer.

kralco626 2010-08-26 11:02:46

Answer 5

+3 A:

RANK, rather than ROW_NUMBER, because you want ties (those with the same B value) to have the same rank value:

WITH summary AS (
  SELECT t.*,
         RANK() OVER (PARTITION BY t.id 
                          ORDER BY t.value_type DESC) AS rank
    FROM TABLE t
   WHERE t.value_type IN ('A', 'B'))
SELECT s.id,
       s.value,
       s.value_type,
       s.data
  FROM summary s
 WHERE s.rank = 1

Non CTE version:

SELECT s.id,
       s.value,
       s.value_type,
       s.data
  FROM (SELECT t.*,
               RANK() OVER (PARTITION BY t.id 
                                ORDER BY t.value_type DESC) AS rank
          FROM TABLE t
         WHERE t.value_type IN ('A', 'B')) s
 WHERE s.rank = 1

WITH test AS (
   SELECT 1 AS id, 'B' AS value_type
   UNION ALL
   SELECT 1, 'B'
   UNION ALL
   SELECT 1, 'A'
   UNION ALL
   SELECT 2, 'A'
   UNION ALL
   SELECT 2, 'A'),
     summary AS (
   SELECT t.*,
          RANK() OVER (PARTITION BY t.id 
                           ORDER BY t.value_type DESC) AS rank
     FROM test t)
SELECT *
  FROM summary
 WHERE rank = 1

I get:

id   value_type  rank
----------------------
1    B           1
1    B           1
2    A           1
2    A           1

OMG Ponies 2010-08-20 20:19:38

+1 Beat me by 12 seconds!

Martin Smith 2010-08-20 20:20:55

@Martin Smith: So you admit it - you *stole* my answer! =)

OMG Ponies 2010-08-20 20:24:46

haha! ya i was doing almost exactly that with ROW_NUMBER. Just when I thought I was getting the hang of using these "window" functions.Thanks! i'll test and let you know.

kralco626 2010-08-20 20:28:05

Your solution actually doesn't work. And i'm not sure why. I get an additional 22 records. Although it was a few seconds faster than Beth's solution.

kralco626 2010-08-20 21:01:41

@kralco626: Are there `value_type`'s other than A and B? See update

OMG Ponies 2010-08-20 21:06:29

no - and i just ran a distinct query to confirm.

kralco626 2010-08-20 21:09:09

I would like to use your solution if you can make it work, because it only uses TABLE once. This is excepcially benificial because that table is a temp table and I would like to replace it with the query that populates that temp table. In your case the query would run once, in Beth's case it would have to be called 3 times.

kralco626 2010-08-20 21:12:35

@kralco626: Here's a [query on the Data Explorer](http://odata.stackexchange.com/stackoverflow/q/9653/rank) using basic data, I get the correct results.

OMG Ponies 2010-08-20 21:21:47

@OMG - i know it does seem odd. It should work. I just get some extra rows when I use it, not sure why. It is part of a larger query, I can post the whole thing if you would like me to.

kralco626 2010-08-20 21:25:02

OMG Ponies 2010-08-20 21:36:59

No- I just ran it. Just running your query and just running Beths you get around 20 more records.

kralco626 2010-08-20 21:46:32

Your result set is actually disjoint from Beths. I ran Beth's query except your query and I got 287659 rows.

kralco626 2010-08-20 22:01:56

OH I THINK I GOT IT. I'm not using A and B for my codes! that was just for example purposes. In my specific case I dont wanna sort DESC! Sorry for the confusing and thanks for the answer!

kralco626 2010-08-20 22:06:37

@kralco626: What's the value_type data type - numeric?

OMG Ponies 2010-08-20 22:07:40

This method, when done on a heap with about 300000 rows, takes up to 1000 times the number of reads as other solutions presented. Indexes or particular patterns of A/B appearances could change things, but not by enough, I think, to fix that.

Emtucifor 2010-08-21 03:44:45

@Omg - No it's a varchar

kralco626 2010-08-23 10:32:24

@Emtucifor - Which other solution is the most efficient? I tried Beths and this solution beat it for time. Also, "table" is a temp table and I want to replace it with a subquery, so a solution that only uses it once is ideal.

kralco626 2010-08-23 10:33:48

Answer 6

A:

Perhaps something like this:

select * from mytable
where id in (select distinct id where value_type = "B")
union
select * from mytable
where id in (select distinct id where value_type = "A" 
and id not in (select distinct id where value_type = "B"))

LymanZerga 2010-08-20 20:27:59

Answer 7

+1 A:

The shortest query to do the job I can think of:

SELECT TOP 1 WITH TIES *
FROM #test
ORDER BY Rank() OVER (PARTITION BY id ORDER BY value_type DESC)

This is about 50% worse on CPU as OMG Ponies' and Christoperous 5000's solutions, but the same number of reads. It's the extra sort that is making it take more CPU.

The best-performing original query I've come up with so far is:

SELECT * 
FROM #test 
WHERE value_type = 'B'
UNION ALL
SELECT * 
FROM #test T1
   WHERE NOT EXISTS (
   SELECT 1
   FROM #test T2
   WHERE
      T1.id = T2.id
      AND T2.value_type = 'B'
)

This consistently beats all the others presented on CPU by about 1/3rd (the others are about 50% more) but has 3x the number of reads. The duration on this query is often 2/3rds the time of all the others. I consider it a good contender.

Indexes and data types could change everything.

Emtucifor 2010-08-21 03:46:24

Thanks for doing the analysis, nice job. However, I need a solution that uses the table only once.

kralco626 2010-08-23 10:38:29

also i think your solutions would only return one record. What if there are more than one record with the highest priority type?

kralco626 2010-08-23 15:36:13

Try the query out :)

Emtucifor 2010-08-23 15:54:17

ansaurus

tags:

views:

answers:

select query select based on a priority

related questions