ansaurus

Question

Finding head and tail events in SQL Server (Optimisation)

Answer 1

A:

From what I understand you are after is the head and tail, ordered by day**, for each ID**. The head and tail being all the records until record where the type is not one is encountered.

This is a different way of doing it, it may be faster

;WITH Ranked AS (
    SELECT 
     *,
     Row_Number() OVER (ORDER BY date) as 'rnk'
    FROM #event
)


SELECT * 
FROM Ranked
WHERE rnk not between 
     (SELECT Min(rnk) FROM Ranked r WHERE r.type <> 1 AND ranked.id = r.id)
     AND (SELECT Max(rnk) FROM Ranked r WHERE r.type <> 1 AND ranked.id = r.id)
order by id

You do not need to use TOP with the exists statement although it doesn't hurt.

Robert Wagner 2008-11-13 05:46:55

This is taking twice the number of reads an 20% longer according to an sql trace, according to query execution plan its 30 times slower (but the number in the plan are often way too high)

Sam Saffron 2008-11-13 07:01:22

Answer 2

A:

Basically looks like you are looking to get event of type 1 that are lesser than in timestamp than other events and greater than timestamp event of other date Try this, its written in Oracle Syntax, not sure about MSSQL.

select e1.* from e1 where e1.id = 1 and (e1.date <= ( select min(e2.date) from e2 where e2.id <> 1 group by e2.date ) or (e1.date >= select max(e3.date) from e3 where e3.id <> 1 group by e3.date ) )

Dheer 2008-11-13 06:31:51

I can not get this SQL to work in SQL Server ... even once i alias everything properly the sub-queries produce an error

Sam Saffron 2008-11-13 06:49:50

Answer 3

A:

Consider

create index idx2 on #event (type)

I don't have SQL Server to check but in Oracle it will eliminate the top-level scan (for 'type=1' condition).

Regarding the query itself - in MS SQL 2000 '[not] exists' and '[not] in' predicates almost always did full scan - we were replacing them with appropriate JOIN's.

Dmitry Khalatov 2008-11-13 06:32:19

another index does not help out here, same performance, 3 table scans. I guess part of the question is "is there any way to replace this with a join"

Sam Saffron 2008-11-13 06:46:13

Answer 4

A:

With only 20 records the query optimizer may easily be concluding that 3 table scans is the quickest way to do it no matter how you express the query or what indexes you create. With small data records the entire table would probably load with one or a handful of disk reads; and within a read block there's not much optimization involved; the optimizer primarily is interested in minimizing disk activity, and all else is comparatively inconsequential. What the optimizer does with so few records is not a good indication of how it will handle larger volumes.

Do you have a real table with more data in it? I can't imagine you can perceive any exectuion time at all.

le dorfier 2008-11-13 07:16:20

Fair point will amend the question to include a massive data set

Sam Saffron 2008-11-13 07:19:03

Ok still getting exactly 3 table scans with Roberts Query it took too long to return a result

Sam Saffron 2008-11-13 07:25:31

btw, I use both SQL server profiler and I look at the execution plan to determine performance

Sam Saffron 2008-11-13 07:28:12

Answer 5

+1 A:

To generate a large subset of data you can use this:

declare @i int 
set @i = 10000
while @i > 5 
begin
    insert into #event values (@i, '2000-01-01', 1) 
    insert into #event values (@i, '2000-01-02', 1) 
    insert into #event values (@i, '2000-01-03', 3) 
    insert into #event values (@i, '2000-01-04', 2) 
    insert into #event values (@i, '2000-01-05', 1)  
    set @i = @i -1 
end

Also, to include lots of events per group try this:

declare @j int 
set @j = 0 
while @j < 10
begin 
    set nocount on 
    declare @i int 
    set @i = 0
    while @i < 10000 
    begin
     insert into #event values (@j, DateAdd(d, @i, '2000-01-01'), rand(10) * 10) 

     set @i = @i  +1 
    end
    set @j = @j + 1 
end
set nocount off

In all my testing it seems my original query only produces 3 table scans and I am not really sure if performance can be improved here.

Sam Saffron 2008-11-13 07:32:30

Answer 6

A:

Here is a version with joins:

select distinct e1.* from #event e1 
left outer join #event e2 ON 
     e1.id = e2.id 
        and e2.date < e1.date 
        and e2.type <> 1
left outer join #event e3 ON
     e1.id = e3.id 
        and e3.date > e1.date 
        and e3.type <> 1
where e1.type = 1 AND (e2.id is null or e3.id is null)

This still has three table scans plus a distinct clause, but it still seems to be faster than the original query.

csgero 2008-11-13 12:48:52

Answer 7

A:

Here's a query whose showplan at least doesn't say "table scan" in it, and gives the same answer.

I don't understand the query in logical terms yet though.

SELECT DISTINCT e1.* FROM #event e1
WHERE e1.type = 1
AND
(
NOT EXISTS (
SELECT 1 FROM #event
WHERE type != 1
AND id = e1.id
AND date < e1.date
)
OR NOT EXISTS (
SELECT 1 FROM #event
WHERE type != 1
AND id = e1.id
AND date > e1.date
)
)

le dorfier 2008-11-13 19:20:22

Nope, hold that - I tested after applying your data generator. But don't concede yet.

le dorfier 2008-11-13 19:25:09

Answer 8

A:

This is better than the last one:

SELECT e1.* FROM event e1
WHERE e1.type = 1
AND NOT EXISTS (
SELECT 1 FROM event
WHERE type != 1
AND id = e1.id
AND date < e1.date
)
UNION ALL SELECT e1.* FROM event e1
WHERE e1.type = 1
AND NOT EXISTS (
SELECT 1 FROM event
WHERE type != 1
AND id = e1.id
AND date > e1.date
)

It gets the same result. But your query, and my previous one, are awful. Here's the IO statistics:

(29985 row(s) affected)

Table 'event'. Scan count 9997, logical reads 20477, physical reads 0, read-ahead reads 0.
Table 'Worktable'. Scan count 39979, logical reads 99950, physical reads 0, read-ahead reads 0.

Here's the same for the most recent query:

(29985 row(s) affected)

Table 'event'. Scan count 4, logical reads 652, physical reads 0, read-ahead reads 0.

Two things to notice --
1. Even worst case, the whole table gets loaded and there are no physical reads.
2. The bad ones have 9997 + 39979 table scans. Not just 4.

Please describe the intent of the query.

le dorfier 2008-11-13 19:51:56

Answer 9

+1 A:

I think this is better:

select e1.group_id, e1.date, e1.type
from #event e1, #event e2
where e1.type = 1
and e2.type <> 1
and e1.group_id= e2.group_id
group by e1.group_id, e1.date, e1.type, e2.group_id
having e1.date < min(e2.date) or e1.date > max(e2.date)

adamant7 2008-12-12 17:07:44

Though this does not achieve the ideal one table scan, it does reduce the table scan to 2 and heavily simplify the execution plan. +1

Sam Saffron 2008-12-22 21:36:04

Originally I thought this is faster but it collapses if you have lots of events for the same group

Sam Saffron 2009-01-05 02:17:29

ansaurus

tags:

views:

answers:

Finding head and tail events in SQL Server (Optimisation)

related questions