views:

70

answers:

7

I kind of have a feel for why the view is slower: The where clause is probably not applied at the same time. The results do seem to be the same, though. I am not sure what I can do about this, short of not using a view...which is not ideal, as I added the view to avoid code repetition, and I don't want to remove it if it isn't necessary.

Any suggestions for to change the way I am doing this so that I can use a view like in Command 1 but still have my query be executed as quickly as it is executed in command 2?

declare @foo varchar(50)
set @foo = 'be%'


ALTER VIEW [dbo].[wpvw_v]
AS
select distinct [name]
from kvgs kvg left join cdes cde
on kvg.kvgi = cde.kgi
group by [name], cde.kgi, kvg.mU
having count(cde.kgi) >= 2 or kvg.mU = 1 or 
   exists (select [name] from FP x where x.name = kvg.name)


--Command 1: Takes 7 seconds
select [name] from wpvw_v where name like @foo


--Command 2: Takes 1 second
SELECT DISTINCT kvg.name
FROM         dbo.kvgs AS kvg LEFT JOIN
                      dbo.cdes AS cde ON kvg.kvgi = cde.kgi
where name like @foo
GROUP BY kvg.name, cde.kgi, kvg.mU
HAVING      (COUNT(cde.kgi) >= 2) OR
                      (kvg.mU = 1) OR
                      EXISTS
                          (SELECT     Name
                            FROM          dbo.FP AS x
                            WHERE      (Name = kvg.name))
A: 

As far as I know, the full result set of the view is collected and then further widdled down by the SELECT statement that uses it. This is very different from your second SELECT statement, which doesn't collect any more than it needs.

David Andres
A: 

You could try an inline tabled function (http://www.sqlhacks.com/index.php/Retrieve/Parameterized-View) but tbh I see that as a bit of a hack.

Honestly I'd probably go for the code repetition. I don't really see SQL in the same way that I see other code - I keep on seeing vast differences in performance between otherwise logically equivalent statements.

Kragen
+1  A: 

Your query from view is like this:

SELECT name FROM (SELECT DISTINCT name FROM ...) WHERE name = @name;

while the second one is:

SELECT DISTINCT name FROM ... WHERE name = @name;

The two queries are very different and even though they produce the same result, the fiurst one can be answered only if the entire table is scanned to produce the distinct names, while the second one can scan only the names you're interested in.

The gist of the problem is that the presence of DISTINCT places a barrier that does not allow for the filtering predicate to move down the query tree to a place where is effective.

Update

Even if DISTINCT is not a barrier, on second look the second look there is a even more powerful barrier there: the GROUP BY/HAVING clause. One query filters after the GROUP and HAVING condition was applied, the other one before. And the HAVING condition has subqueries that reference name again. I doubt the QO can proove the equivalence of the filtering before the aggregate and filtering after the aggregate.

Remus Rusanu
This is not likely to be an issue. SQL Server's query processor is able to recognize that it makes no difference to the result if DISTINCT is applied before or after selecting the matching names.There is no barrier here. SQL statements carry with them no directions about order of evaluation. The query processor is free to return the correct result from any execution plan.(I tested SQL Server 2008 for this kind of rewriting, but I highly doubt this isn't also done in 2005, 2000, and maybe 7.0.)
Steve Kass
@Steve: you're right. is not the distinct, is the HAVING that si the barrier.
Remus Rusanu
@Remus: GROUP BY / HAVING is not a barrier either. I just tested it, and SQL Server 2008 pushes a LIKE predicate before the GROUP BY even when it's specified in a WHERE clause after SELECT from a view containing the GROUP BY and HAVING specs.The query here is more complicated, but there's no reason I see why the predicate can't be pushed up. Whether it is being pushed up or not, I don't know without seeing the query plan.
Steve Kass
A: 

Without seeing more (the CREATE TABLE, INDEX, and CONSTRAINT statements for each table, for example) and preferably seeing the query plans as some sample data representative of the cardinality of the join as well, it's hard to say.

Possibly, there is a semantic difference between the queries that has to do with the collation under which the LIKE expression is evaluated, and it might be impossible to coax the same plan.

However, there is probably plenty of room for query tuning here. It seems unlikely you need to fully aggregate all the COUNT()s. You have three rather distinct conditions under which you want to see a "name" in your result. With UNION you might be able to make one or more of them simpler to calculate, and if concurrency isn't an issue, you might even write this as a multi-step user-defined table-valued function that accumulates the names in separate steps.

Steve Kass
+1  A: 

I didn't think the HAVING clause could accommodate what you'd posted, but I believe your view should be written to use UNIONs instead. Here's my take on it:

ALTER VIEW [dbo].[wpvw_v] AS
WITH names AS(
  SELECT k.name
    FROM KVGS k 
   WHERE EXISTS(SELECT NULL
                  FROM CDES c
                 WHERE c.kgi = k.kvgi
              GROUP BY c.kgi
                HAVING COUNT(c.kgi) > 1)
  UNION ALL
  SELECT k.name
    FROM KVGS k 
   WHERE k.mu = 1
GROUP BY k.name
  UNION ALL
  SELECT k.name
    FROM KVGS k 
    JOIN FP x ON x.name = k.name
GROUP BY k.name)
SELECT n.name
  FROM names n

If you want to filter out duplicates between the 3 SQL statements, change UNION ALL to UNION. Then you can use:

SELECT n.name
  FROM wpvw_v n
 WHERE CHARINDEX(@name, n.name) > 0
OMG Ponies
I ended up using an or instead of a union.
Brian
A: 

I believe the following reproduces your problem:

create table tbl (idx int identity(1,1), name varchar(50), val float)

declare @cnt int
set @cnt=0
while @cnt < 10000
begin
insert tbl select char(CAST(rand()*256 AS INT)), rand()
set @cnt = @cnt + 1
end
go
create view tbl_view as select distinct name from tbl group by name having sum(val) > 1

Then if you run the following query:

SET STATISTICS IO ON
declare @n varchar(50)
set @n='w%'
select * from tbl_view where name like @n
SET STATISTICS IO OFF
GO
SET STATISTICS IO ON
declare @n varchar(50)
set @n='w%'
select distinct name from tbl where name like @n group by name having sum(val) > 1
SET STATISTICS IO OFF

You get the following:

(1 row(s) affected)
Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'tbl'. Scan count 1, logical reads 338, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

(1 row(s) affected)
Table 'tbl'. Scan count 1, logical reads 338, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

The view forces it to work off a sub-table first and only then to apply the filter. Now, if you modify the view and remove the DISTINCT, this does not change. But if you modify the view to remove the group by:

create view tbl_view as select name from tbl where val > 0.8 group by name 
go
SET STATISTICS IO ON
declare @n varchar(50)
set @n='w%'
select * from tbl_view where name like @n
SET STATISTICS IO OFF
GO
SET STATISTICS IO ON
declare @n varchar(50)
set @n='w%'
select name from tbl where val > 0.8 and name like @n group by name
SET STATISTICS IO OFF

Then you get the same results for both queries:

(1 row(s) affected)
Table 'tbl'. Scan count 1, logical reads 34, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

(1 row(s) affected)
Table 'tbl'. Scan count 1, logical reads 34, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

So it does seem like the HAVING is the barrier.

Locksfree
A: 

Table-valued functions tend to be quicker than views, assuming your WHERE conditions are known and can be supplied in parameters.

One advantage of table-valued functions is that you can have multiple statements, so you can convert OUTER JOINs to quicker INNER JOINS in subsequent statements. So instead of this:

INSERT INTO @resultTable
    table1_id,
    table1_column,
    table2_column,
    table3_column
SELECT
    table1.id,
    table1.column,
    table2.column,
    table3.column
FROM
    table1
    INNER JOIN table2 ON table2.table1_id = table1.id
    LEFT OUTER JOIN table3 ON table3.table1_id = table1.id

return @resultTable

... you can do this, which I find is always faster:

INSERT INTO @resultTable
    table1_id,
    table1_column,
    table2_column,
SELECT
    table1.id,
    table1.column,
    table2.column,
FROM
    table1
    INNER JOIN table2 ON table2.table1_id = table1.id

UPDATE @resultTable SET
    table3_column = table3.column
FROM @resultTable AS result
    INNER JOIN table3 ON table3.table1_id = result.table1_id

return @resultTable
Kevin Conner