views:

5814

answers:

5

What is the main purpose of using CROSS APPLY?

I have read (vaguely, through posts on the Internet) that cross apply can be more efficient when selecting over large data sets if you are partitioning. (Paging comes to mind)

I also know that CROSS APPLY doesn't require a UDF as the right-table.

In most INNER JOIN queries (one-to-many relationships), I could rewrite them to use CROSS APPLY, but they always give me equivalent execution plans.

Can anyone give me a good example of when CROSS APPLY makes a difference in those cases where INNER JOIN will work as well?


Edit:

Here's a trivial example, where the execution plans are exactly the same. (Show me one where they differ and where cross apply is faster/more efficient)

create table Company (
    companyId int identity(1,1)
,   companyName varchar(100)
,   zipcode varchar(10) 
,   constraint PK_Company primary key (companyId)
)
GO

create table Person (
    personId int identity(1,1)
,   personName varchar(100)
,   companyId int
,   constraint FK_Person_CompanyId foreign key (companyId) references dbo.Company(companyId)
,   constraint PK_Person primary key (personId)
)
GO

insert Company
select 'ABC Company', '19808' union
select 'XYZ Company', '08534' union
select '123 Company', '10016'


insert Person
select 'Alan', 1 union
select 'Bobby', 1 union
select 'Chris', 1 union
select 'Xavier', 2 union
select 'Yoshi', 2 union
select 'Zambrano', 2 union
select 'Player 1', 3 union
select 'Player 2', 3 union
select 'Player 3', 3 


/* using CROSS APPLY */
select *
from Person p
cross apply (
    select *
    from Company c
    where p.companyid = c.companyId
) Czip

/* the equivalent query using INNER JOIN */
select *
from Person p
inner join Company c on p.companyid = c.companyId
A: 

This is a great article on the matter.

Ariel
While this is a good article, this doesn't really answer my question insofar as I want to know when I should use CROSS APPLY instead of an INNER JOIN
Jeff Meatball Yang
+11  A: 

Can anyone give me a good example of when CROSS APPLY makes a difference in those cases where INNER JOIN will work as well?

See the article article in my blog for detailed performance comparison:

CROSS APPLY works better on things that have no simple JOIN condition.

This one selects 3 last records from t2 for each record from t1:

SELECT  t1.*, t2o.*
FROM    t1
CROSS APPLY
        (
        SELECT  TOP 3 *
        FROM    t2
        WHERE   t2.t1_id = t1.id
        ORDER BY
                t2.rank DESC
        ) t2o

It cannot be easily formulated with an INNER JOIN condition.

You could probably do something like that using CTE's and window function:

WITH    t2o AS
        (
        SELECT  t2.*, ROW_NUMBER() OVER (PARTITION BY t1_id ORDER BY rank) AS rn
        FROM    t2
        )
SELECT  t1.*, t2o.*
FROM    t1
INNER JOIN
        t2o
ON      t2o.t1_id = t1.id
        AND t2o.rn <= 3

, but this is less readable and probably less efficient.

Update:

Just checked.

master is a table of about 20,000,000 records with a PRIMARY KEY on id.

This query:

WITH    q AS
        (
        SELECT  *, ROW_NUMBER() OVER (ORDER BY id) AS rn
        FROM    master
        ),
        t AS 
        (
        SELECT  1 AS id
        UNION ALL
        SELECT  2
        )
SELECT  *
FROM    t
JOIN    q
ON      q.rn <= t.id

runs for almost 30 seconds, while this one:

WITH    t AS 
        (
        SELECT  1 AS id
        UNION ALL
        SELECT  2
        )
SELECT  *
FROM    t
CROSS APPLY
        (
        SELECT  TOP (t.id) m.*
        FROM    master m
        ORDER BY
                id
        ) q

is instant.

Quassnoi
See the end of Ariel's link. A row_number() query is just as nice and doesn't even require a join. So I don't think I should use cross apply for this situation (select top 3, partition by t1.id).
Jeff Meatball Yang
Good example! The raw performance increase is very apparent. How do the execution plans differ?
Jeff Meatball Yang
@Jeff: they differ much, as you can see :) `JOIN` uses `NESTED LOOPS` with `master` as a leading table, `CROSS APPLY` uses `NESTED LOOPS` too but `t` is leading and `TOP` is applied to `master` on each loop.
Quassnoi
A: 

I guess it should be readability ;)

CROSS APPLY will be somewhat unique for people reading to tell them that a UDF is being used which will be applied to each row from the table on the left.

Ofcourse, there are other limitations where a CROSS APPLY is better used than JOIN which other friends have posted above.

shahkalpesh
+2  A: 

here is an example when CROSS APPLY makes a huge difference with performance:

Using CROSS APPLY to optimize joins on BETWEEN conditions

Note that besides replacing inner joins you can also reuse code such as truncating dates without paying performance penalty for involing scalar UDFs, for example: Calculating third Wednesday of the month with inline UDFs

AlexKuznetsov
Nice example and a nice blog post. Definitely a +1.
Quassnoi
A: 

In SQL 2000, you could not execute a table valued function with parameters taken from other

--Fails in SQL 2000, allowed in SQL 2005
SELECT
    *
FROM
   MyTable t
   JOIN
   dbo.MyTVF (t.col1, t.col2) u ON ...

Cross and outer apply allow this (but without the on clause).

APPLY and JOIN are not really interchangeable.

Or: if you need rows from udf to be added to the intermediate results, use CROSS APPLY.

Your example is simply UDF expresses as a subquery. Have you compared plans?

gbn