views:

69

answers:

5

I'm building up a new database in SQL Server 2008 for some reporting, and there are many common business rules pertaining to this data that go into different types of reports. Currently these rules are mostly combined in larger procedural programs, in a legacy language, which I'm trying to move over to SQL. I'm shooting for flexibility in implementing reporting from this data, like some reporting in SAS, some in C#, etc.

My approach currently is to break up these common rules (usually VERY simple logic) and encapsulate them in individual SQL UDFs. Performance is not a concern, I just want to use these rules to populate static fields in a sort of reporting "snapshot", which can then be used to report from in whatever way you want.

I like this modular approach as far as understanding what each rule is doing (and maintaining the rules themselves), but I'm also starting to become a bit afraid that the maintenance may also become a nightmare. Some rules depend on others, but I can't really get away from that - these things build off each other...which is what I want...I think? ;)

Are there some better approaches for this modular approach in a database? Am I on the right track, or am I thinking of this in too much of a application-development mindset?

+2  A: 

At some point, extensive use of UDFs will start to cause performance problems as they are executed for each row in your resultset and obscure logic from the optimizer, making it hard to use indexes (i.e. I don't really understand how performance can not be an issue, but you know your requirements best). For certain functionality they are great; but use them sparingly.

davek
I agree completely with that, but I just use these when I take a particular "snapshot" to populate static fields, which are what someone would query against for any reporting purpose. Otherwise I would definitely either move this logic into the reporting layer (and fight for a standard reporting implementation), or work to move things into table-valued functions. Thanks for the feedback, though!
chucknelson
not all UDFs execute once per row, inline ones are flattened out by the optimizer. Scalar UDFs are indeed very slow
AlexKuznetsov
Table UDF would not be executed for each row?
Jeff O
+1  A: 

I'd say that you are on the right track - sql procedures can rapidly get out of hand as the become more and more complex and encapsulating shared, repeated pieces of logic into UDFs is an entirely appropriate solution to address this.

I often go as far as encapsulating logic from a sql procedure that is only used in that one procedure into a well named UDF to improve readibility.

Have a look at this MSDN article on UDFs - perhaps it will give you some more ideas about their uses?

There are various performance considerations that you will need to be aware of if you intend to use UDFs heavily - things like the performance of scalar vs table UDFs and the possible benefits of CLR UDFs.

David Hall
A: 

If your interesting in building a data warehouse for reporting you would try to put as much of this into the Transform part of your ETL as possible so your reporting SQL is comprised of simple statements that tools and users alike are capable of generating.

SSIS is very capable ETL tool that comes with SQL server for this sort of thing.

jms
Currently it's a pretty small scope, but I agree, I would love to use some proper reporting tools/processes for this. For now I'm trying to build something maintainable and easy to read/understand for a future generation ;)
chucknelson
+2  A: 

Keeping logic on database side is almost always a right thing to do.

As you mentioned in your question, most business rules involve quite simple logic but it usually deals with huge volumes of data.

The database engine is the right thing to implement that logic because, first, it keeps data I/O to a minimum, and, second, database performs mosts data transformations much more efficiently.

Some time ago I wrote a very subjective blog post on this topic:

One side note: a UDF is not the same as a stored procedure.

A UDF is a function designed by callable inside a query, so it can do only a very limited subset of possible operations.

You can do much more is a stored procedure.

Update:

In the example you gave, like changing logic that calculates a "derived field", the UDF that calculates the field is OK.

But (just in case) when performance will be an issue (and believe me, this will be much sooner that one may think), transforming data with set-based operations may be much more efficient than using UDFs.

In this case, you may want to create a view, a stored procedure or a table valued function returning a resultset which will contain a more efficient query rather that limiting yourself to updating the UDFs (which are record-based).

One example: your query has something like "user score" which you feel to be subject to change and wrap it into a UDF

SELECT  user_id, fn_getUserScore(user_id)
FROM    users

Initially, this is just a plain field in the table:

CREATE FUNCTION fn_getUserScore(@user_id INT) RETURNS INT
AS
BEGIN
        DECLARE @ret INT
        SELECT  user_score
        INTO    @ret
        FROM    users
        WHERE   user_id = @user_id
        RETURN @ret
END

, then you decide it to calculate it using data from other table:

CREATE FUNCTION fn_getUserScore(@user_id INT) RETURNS INT
AS
BEGIN
        DECLARE @ret INT
        SELECT  SUM(vote)
        INTO    @ret
        FROM    user_votes
        WHERE   user_id = @user_id
        RETURN @ret
END

This will condemn the engine to using the least efficient NESTED LOOPS algorithm in either case.

But if you created a view and rewritten the underlying queries like this:

SELECT  user_id, user_score
FROM    users

SELECT  user_id, SUM(vote) AS user_score
FROM    users u
LEFT JOIN
        user_votes uv
ON uv.user_id = u.user_id

, this would give the engine much wider space for optimization while still keeping the resultset structure and separating logic from presentation.

Quassnoi
+1 for keeping logic on db side (although many would strongly disagree!)
davek
Don't know if I'm looking for efficiency more than just separating these rules from whatever is the choice for implementing reports - I just want people to be able to look at the data and be like "ah hah, there is derived-field X, where I want to filter for 'N'" or something like that :) If the logic changes for derived-field X based on some feedback, just update a UDF or two and update that field. This is my vision, anyway ;)
chucknelson
`@chucknelson`: I wish I had people that would be able to see a "derived field" in the data as my customers :)
Quassnoi
Instead of a view, why not a table function just in case you need to pass a parameter?
Jeff O
@Jeff O: table valued function is fine too.
Quassnoi
+1  A: 

SQL is set based, and inherently performs poorly when applying a modular approach.
Functions, Stored Procedures and/or Views - they all abstract the underlying logic. The performance problem comes into play when you use two (or more) functions/etc that utilize the same table(s). It means that two queries are made the the same table(s) when one could've been used.

The use of multiple functions says to me that the data model was made to be very "flexible". To me, that means questionable data typing and overall column/table definition. There's a need for functions/etc because the database will allow anything to be stored, which means the possibility of bad data is very high. I'd rather put the effort into always having good/valid data, rather than working after the fact to combat existing bad data.

The database is the place to contain this logic. It is faster than application code, and most importantly - centralized to minimize maintainence.

OMG Ponies
I agree - the base data I get is very broad. I'm basically treating my "snapshot" process as an ETL step, and using these UDFs to populate fields that will be very useful by whoever wants to use the data for reporting (or whatever else, really).
chucknelson
@chucknelson: I understand - my current job has me performing similar duties.
OMG Ponies