SQL Efficiency with Function

views:

121

answers:

+1 Q:

SQL Efficiency with Function

Hello,

So I've got this database that helps organize information for academic conferences, but we need to know sometimes whether an item is "incomplete" - the rules behind what could make something incomplete are a bit complex, so I built them into a scalar function that just returns true if the item is complete and 0 otherwise.

The problem I'm running into is that when I call the function on a big table of data, it'll take about 1 minute to return the results. This is causing time-outs on the web site.

I don't think there's much I can do about the function itself. But I was wondering if anybody knows any techniques generally for these kinds of situations? What do you do when you have a big function like that that just has to be run sometimes on everything? Could I actually store the results of the function and then have it refreshed every now and then? Is there a good and efficient way to have it stored, but refresh it if the record is updated? I thought I could do that as a trigger or something, but if somebody ever runs a big update, it'll take forever.

Thanks,

Mike

+1 A:

If the function is deterministic you could add it as a computed column, and then index on it, which might improve your performance.

MSDN documentation.

cmsjr 2009-08-14 14:25:42

If your function is that inefficient, you'll have to deal with either out of date data, or slow results.

It sounds like you care more about performance, so like @cmsjr said, add the data to the table.

Also, create a cron job to refresh the results periodically. Perhaps add an updated column to your database table, and then the cron job only has to re-process those rows.

One more thing, how complex is the function? Could you reduce the run-time of the function by pulling it out of SQL, perhaps writing it a layer above the database layer?

Dominic Rodger 2009-08-14 14:31:32

+1 A:

The problem is that the function looks at an individual record and has logic such as "if this column is null" or "if that column is greater than 0". This logic is basically a black box to the query optimizer. There might be indexes on those fields it could use, but it has no way to know about it. It has to run this logic on every available record, rather than using the criteria in a functional matter to pare down the result set. In database parlance, we would say that the UDF is not sargable.

So what you want is some way to build your logic for incomplete conferences into a structure that the query optimizier can take better advantage of: match conditions to indexes and so forth. Off the top of my head, your options to do this include a view or a computed column.

Joel Coehoorn 2009-08-14 14:32:10

I've encountered cases where in SQL Server 2000 at least a function will perform terribly and just breaking that logic out and putting it into the query speeds things tremendously. This is an edge case but if you think the function is fine then you could try that. Otherwise I'd look to compute the column and store it as others are suggesting.

Jon 2009-08-14 14:35:17

Don't be so sure that you can't tune your function.

Typically, with a 'completeness' check, your worst time is when the record's actually complete. For everything else, you can abort early, so either test the cases that are fastest to compute first, or those that are most likely to cause the record to be flagged incomplete.

For bulk updates, you either have to just sit and wait, or come up with a system where you can run a less complete by faster check first, and then a more thorough check in the background.

Joe 2009-08-14 14:45:18

+1 A:

Scalar UDFs in SQL Server perform very poorly at the moment. I only use them as a carefully planned last resort. There are probably ways to solve your problem using other techniques (even deeply nested views or inline TVF which build up all the rules and are re-joined) but it's hard to tell without seeing the requirements.

Cade Roux 2009-08-14 15:11:06

As Cade Roux says Scalar functions are evil they are interpreted for each row and as a result are a big problem where performance is concerned. If possible use a table valued function or computed column

John Nolan 2009-08-14 15:23:46

ansaurus

tags:

views:

answers:

SQL Efficiency with Function

related questions