views:

302

answers:

2

I want to memoize function results for performance, i.e. lazily populate a cache indexed on the function arguments. The first time I call a function, the cache won't have anything for the input arguments, so it will calculate it and store it before returning it. Subsequent calls just use the cache.

However, it seems that SQL Server 2000 has a stupid arbitrary rule about functions being "deterministic". INSERTs, UPDATEs, and regular stored procedure calls are forbidden. However, extended stored procedures are allowed. How is this deterministic? If another session modifies the database state, the function output will change anyways.

I'm steaming mad. I had thought I could make caching transparent to the user. Is this possible? I don't have the permissions to deploy extended stored procedures.

EDIT:

This limitation is still in 2008. You can't call RAND, for God's sake!

The cache would be implemented by me in the DB. A cache is any data store used for caching...

EDIT:

There are no cases where the same arguments to a function will yield different results, outside of changes to the underlying data. This is a BI platform, and the only changes come from scheduled ETL, at which time I would TRUNCATE the cache table.

These are I/O intensive time series calculations, on the order of O(n^4). I don't have the mandate to change the underlying table or indexes. Also, a lot of these functions use the same intermediate functions, and caching allows those to be used.

UDFs are not truly deterministic, unless they account for changes in database state. What's the point? Is SQL Server caching? (Ironic.) If SQL Server is caching, then it must be expiring on changes to tables that are schema bound. If they're schema bound, then why not bind tables that the function modifies? I can see why procs aren't allowed, although that's just sloppy; just schema bind procs. And, BTW, why allow extended stored procs? You can't possibly track what those do to ensure determinism!!! Argh!!!

EDIT:

My question is: Is there any way to lazily cache function results in a way that can be used in a view?

A: 

I wasn't aware that SQL Server 2000 had any caching at all. What are you referring to?

I also don't think it will be very productive to be upset about a nine year-old piece of software. Chances are that, any problem you're having has been fixed twice by now.

John Saunders
+1  A: 

Deterministic means that the same inputs return the same output independent of time and database.

SQL Server (any version) does no caching of UDFs - I believe it will avoid calling the UDF twice on a single row, but that's it.

One trick I've used is to (I think I posted it here on SO):

Refactor the UDF if you can so that there are effectively a usable discrete subset of values returned for a given set of inputs. For numerical calculations, one can sometimes refactor the logic to return a factor or rate which is multiplied outside the UDF instead of multiplied inside the UDF from a passed in value.

Call the UDF over the DISTINCT rowset and cache the results to a temporary table. If you are only calling the UDF with 100,000 tuples of parameters over a 17,000,000 row set, this is very much more efficient.

JOIN to the temporary table (basically converting from code-based logic to table-based logic) to get values.

This table can be re-used as necessary or even kept.

Addition to the table can be done by first LEFT JOINing to find missing cached entries.

This works for both single-row table-valued UDFs and scalar UDFs. I'm mainly using it for table-valued UDFs. There is a hotfix to SQL Server 2005 which is supposed to address the UDF performance - I'm waiting on mthe DBAs to test it before deploying to production.

Cade Roux
First of all, the point of the cache is to calc lazily. Populating a table with more defeats the purpose. Second, all this approach does is move caching out of the UDF. Why not just use skip the UDF and use a proc? Finally, functions can be used in views, which have advantages procs don't.
entaroadun
I fully populate all and only the potential UDF results. Some will be called more than once, none will be called 0 times. It's not a cache - it's a precalc. The savings is directly measurable by the difference in UDF calls actually made to populate the lookup table. The tradeoff is storage.
Cade Roux