ansaurus

Question

Massive CROSS JOIN in SQL Server 2005

Answer 1

A:

Could you give the sample table schema & why is the cross join being done? Example output?

That will help understand the real issue and a way in which query can be re-written.

shahkalpesh 2008-12-10 03:13:54

Added the original code I'm porting to the question.

Cade Roux 2008-12-10 03:45:48

shahkalpesh 2008-12-10 03:50:04

Added the SQL code, too.

Cade Roux 2008-12-10 04:06:40

Answer 2

+1 A:

Break down the query to make it a plain simple cross join.


   SELECT  CC.COST_CTR_NUM, GL.ACCOUNT_NO
              ,CC.COST_CTR_NUM AS CA_COSTCENT
              ,GL.ACCOUNT_NO AS CA_GLACCOUNT
              ,GL.LI_LNTM AS CA_LNTM
-- I don't know what is BUPDEF doing? but remove it from the query for time being
--              ,udf_BUPDEF(GL.ACCOUNT_NO, CC.COST_CTR_NUM, GL.LI_LNTM, 'N') AS CA_UNIT
       FROM   JOINGLAC AS GL
       CROSS JOIN COSTCENT AS CC

See how good is the simple cross join? (without any functions applied on it)

shahkalpesh 2008-12-10 05:08:24

If that works fast, try just doing SELECT (with the functions applied) and see, if that is still OK?

shahkalpesh 2008-12-10 05:11:00

posted results so far

Cade Roux 2008-12-17 15:06:29

The UDF performance (they are scalar and do not access tables) was the key and is horrible - the two UDFs can only process around 300 rows per second. I am currently engaged in finding workarounds.

Cade Roux 2009-02-26 18:52:06

Answer 3

+2 A:

Examining that query shows only one column used from one table, and only two columns used from the other table. Due to the very low numbers of columns used, this query can be easily enhanced with covering indexes:

CREATE INDEX COSTCENTCoverCross ON COSTCENT(COST_CTR_NUM)
CREATE INDEX JOINGLACCoverCross ON JOINGLAC(ACCOUNT_NO, LI_LNTM)

Here are my questions for further optimization:

When you put the query in query analyzer and whack the "show estimated execution plan" button, it will show a graphical representation of what it's going to do.

Join Type: There should be a nested loop join in there. (the other options are merge join and hash join). If you see nested loop, then ok. If you see merge join or hash join, let us know.

Order of table access: Go all the way to the top and scroll all the way to the right. The first step should be accessing a table. Which table is that and what method is used(index scan, clustered index scan)? What method is used to access the other table?

Parallelism: You should see the little jaggedy arrows on almost all icons in the plan indicating that parallelism is being used. If you don't see this, there is a major problem!

That udf_BUPDEF concerns me. Does it read from additional tables? Util.PADLEFT concerns me less, but still.. what is it? If it isn't a Database Object, then consider using this instead:

RIGHT('z00000000000000000000000000' + columnName, 7)

Are there any triggers on JOINCCAC? How about indexes? With an insert this large, you'll want to drop all triggers and indexes on that table.

David B 2008-12-10 05:41:09

BUPDEF is a huge business logic function which was ported. Hopefully it will go away (like this table), but there are no lookups in it (or the other UDF I left out for clarity)

Cade Roux 2008-12-11 07:21:57

The UDF performance was the key and is horrible - the two UDFs can only process around 300 rows per second. I am currently engaged in finding workarounds.

Cade Roux 2009-02-26 18:50:59

Answer 4

+2 A:

Continuing on what others a saying, DB functions that contained queries which are used in a select always made my queries extremely slow. Off the top of my head, I believe i had a query run in 45 seconds, then I removed the function, and then result was 0 seconds :)

So check udf_BUPDEF is not doing any queries.

Ben Dempsey 2008-12-10 06:00:25

The UDF performance (they are scalar and do not access tables) was the key and is horrible - the two UDFs can only process around 300 rows per second. I am currently engaged in finding workarounds.

Cade Roux 2009-02-26 18:51:33

Answer 5

A:

So, what was the real issue?

It will be great, if you could share your findings.

shahkalpesh 2008-12-10 18:13:46

I have not been able to re-test it, but the 30m row problem was due to me not filtering a table on date, so it came down to 2 hours and completed the 15m rows). I suspect it's actually outperforming Focus, which is really my #1 concern. Then optimize the hell out of it or optimize it away.

Cade Roux 2008-12-11 07:24:43

The UDF performance (they are scalar and do not access tables) was the key and is horrible - the two UDFs can only process around 300 rows per second. I am currently engaged in finding workarounds.

Cade Roux 2009-02-26 18:52:37

ansaurus

tags:

views:

answers:

Massive CROSS JOIN in SQL Server 2005

related questions