tags:

views:

178

answers:

10

At a new job, I've just been exposed to the concept of putting logic into SQL statements.

In MySQL, a dumb example would be like this:

SELECT
    P.LastName, IF(P.LastName='Baldwin','Michael','Bruce') AS FirstName
FROM
    University.PhilosophyProfessors P
// This is like a ternary operator; if the condition is true, it returns 
// the first value; else the second value. So if a professor's last name 
// is 'Baldwin', we will get their first name as "Michael"; otherwise, "Bruce"**

For a more realistic example, maybe you're deciding whether a salesperson qualifies for a bonus. You could grab various sales numbers and do some calculations in your SQL query, and return true / false as a column value called "qualifies."

Previously, I would have gotten all the sales data back from the query, then done the calculation in my application code.

To me, this seems better, because if necessary, I can walk through the application logic step-by-step with a debugger, but whatever the database is doing is a black box to me. But I'm a junior developer, so I don't know what's normal.

What are the pros and cons of having the database server do some of your calculations / logic?

**Code example based on Monty Python sketch.

+5  A: 

This way SQL becomes part of your domain model. It's one more (and not necessarily obvious) place where domain knowledge is implemented. Such leaks result in tighter coupling between business logic / application code and database, what usually is a bad idea.

One exception is views, report queries etc. But these usually are so isolated that it's obvious what role they play.

Konrad Garus
+1  A: 

That particular first example is a bad idea. Per-row functions do not scale well as the table gets bigger. In fact, a (likely) better way to do it would be to index LastName and use something like:

SELECT P.LastName, 'Michael' AS FirstName
    FROM University.PhilosophyProfessors P
    WHERE P.LastName = 'Baldwin'
UNION ALL SELECT P.LastName, 'Bruce' AS FirstName
    FROM University.PhilosophyProfessors P
    WHERE P.LastName <> 'Baldwin'

On databases where data are read more often than written (and that's most of them), these sorts of calculations should be done at write time such as using an insert/update trigger to populate a real FirstName field.

Databases should be used for storing and retrieving data, not doing massive non-databasey calculations that will slow down everything.

paxdiablo
So for the salesperson calculation, when their sales numbers get updated, you'd automatically do the calculations and update a column saying whether they qualify for a bonus?
Nathan Long
No, for that case, I'd do it in the application logic. Whether they _qualify_ for a bonus is a transient thing, not data. Storing what they were _paid_ (or will be paid) would be data.
paxdiablo
+2  A: 

Many times the answer to this type of question is going to depend a great deal on deployment approach. Where it makes the most sense to place your logic depends on what you'll need to be able to get access to when making changes.

In the case of web applications that aren't compiled, it can be easier to deal with changes to a page or file than it is to work with queries (depending on query complexity, programming backgrounds / expertise, etc). In these kinds of situations, logic in the scripting language is typically ok and make make it easier to revise later.

In the case of desktop applications that require more effort to modify, placing this kind of logic in the database where it can be adjusted without requiring a recompilation of the application may benefit you. If there was a decision made that people used to qualify for bonuses at 20k, but now must make 25k, it'd be much easier to adjust that on the SQL Server than to recompile your accounting application for all of your users, for example.

g.d.d.c
A: 

One big pro: a query may be all you can work with. Reports have been mentioned: many reporting tools or reporting plugins to existing programs only allow users to make their own queries (the results of which they will display).

If you cannot alter the code (because it isn't yours), you may yet be able to alter a query. And in some cases (data migration), you'll be writing queries to do migration as well.

Tobiasopdenbrouw
+2  A: 

There aren't a lot of absolute pros and cons to this argument, so the answer is 'it depends.' Some scenarios with different conditions that affect this decision might be:

Client-server app

One example of a place where it might be appropriate to do this is an older 4GL or rich client application where all database operations were done through stored procedure based update, insert, delete sprocs. In this case the gist of the architecture was to have the sprocs act as the main interface for the database and all business logic relating to particular entities lived in the one place.

This type of architecture is somewhat unfashionable these days but at one point it was considered to be the best way to do it. Many VB, Oracle Forms, Informix 4GL and other client-server apps of the era were done like this and it actually works fairly well.

It's not without its drawbacks, however - SQL is not particularly good at abstraction, so it's quite easy to wind up with fairly obtuse SQL code that presents a maintenance issue through being hard to understand and not as modular as one might like.

Is it still relevant today? Quite often a rich client is the right platform for an application and there's certainly plenty of new development going on with Winforms and Swing. We do have good open-source ORMs today where a 1995 vintage Oracle Forms app might not have had the option of using this type of technology. However, the decision to use an ORM is certainly not a black and white one - Fowler's Patterns of Enterprise Application Architecture does quite a good job of running through a range of data access strategies and discussing their relative merits.

Three tier app with rich object model

This type of app takes the opposite approach, and places all of the business logic in the middle tier model object layer with a relatively thin database layer (or perhaps an off-the-shelf mechanism like an ORM). In this case you are attempting to place all the application logic in the middle-tier. The data access layer has relatively little intelligence, except perhaps for a handful of stored procedured needed to get around limits of an ORM.

In this case, SQL based business logic is kept to a minimum as the main repository of application logic is the middle-tier.

Overhight batch processes

If you have to do a periodic run to pick out records that match some complex criteria and do something with them it may be appropriate to implement this as a stored procedure. For something that may have to go over a significant portion of a decent sized database a sproc based approch is probably going to be the only reasonably performant way to do this sort of thing.

In this case SQL may well be the appropriate way to do this, although traditional 3GLs (particularly COBOL) were designed specifically for this type of processing. In really high volume environments (particularly mainframes) doing this type of processing with flat or VSAM files outside a database may be the fastest way to do it. In addition, some jobs may be inherently record-oriented and procedural, or may be much more transparent and maintanable if implemented in this way.

To paraphrase Ed Post, 'you can write COBOL in any language' - although you might not want to. If you want to keep it in the database, use SQL, but it's certainly not the only game in town.

Reporting

The nature of reporting tools tends to dictate the means of encoding business logic. Most are designed to work with SQL based data sources so the nature of the tool forces the choice on you.

Other domains

Some applications like ETL processing may be a good fit for SQL. ETL tools start to get unwiedly if the transformation gets too complex, so you may want to go for a stored procedure based architecture. Mixing Queries and transformations across extraction, ETL processing and stored-proc based processing can lead to a transformation process that is hard to test and troubleshoot.

Where you have a significant portion of your logic in sprocs it may be better to put all of the logic in this as it gives you a relatively homogeneous and modular code base. In fact I have it on fairly good authority that around half of all data warehouse projects in the banking and insurance sectors are done this way as an explicit design decision - for precisely this reason.

ConcernedOfTunbridgeWells
A: 

The answer depends on your expertise and your familiarity with the technologies involved. Also, if you're a technical manager, it depends on your analysis of the skills of the people working on your team and whom you intend on hiring / keeping on staff to support, extend and maintain the application in future.
If you are not literate and proficient in the database , (as you are not) then stick with doing it in code. If otoh, you are literate and proficient in database coding (as you should be), then there is nothing wrong (and a lot right) abput doing it in the database.

Two other considerations that might influence your decision are whether the logic is of such a complex nature that doing it in database code would be inordinately more complex or more abstract than in code, and second, if the process involved requires data from outside the database (from some other source) In either of these scenarios I would consider moving the logic to a code module.

Charles Bretana
+7  A: 

One of the most persuasive reasons to push logic out to the database is to minimise traffic. In the example given, there is little gain, since you are fetching the same amount of data whether the logic is in the query or in your app.

If you want to fetch only users with a first name of Michael, then it makes more sense to implement the logic on the server. Actually, in this simple example, it doesn't make much difference, since you could specify users who's lastname is Baldwin. But consider a more interesting problem, whereby you give each user a "popularity" score based on how common their first and last names are, and you want to fetch the 10 most "popular" users. Calculating "popularity" in the app would mean that you have to fetch every single user before ranking, sorting and choosing them locally. Calculating it on the server means you can fetch just 10 rows across the wire.

Marcelo Cantos
So I guess in this case, the judgment would depend on what is most expensive: calculation on the db server, calculation on the application server, or network traffic? Is there an easy answer to that question, or (as I suspect) is it another "it depends on your setup?"
Nathan Long
I believe the DRY principle is actually a more persuasive reason (or, at least, *as* persuasive) as network traffic.
Larry Lustig
@Larry: I agree. Amended.
Marcelo Cantos
From a DBA perspective, pushing work to the database to minimise traffic often qualifies for the death penalty :-) Your examples seem okay since that's what a DBMS is good for. But the example in the question is _not_ the same since it likely does per-row calculations, a serious warning sign for DBAs. A DBMS is a shared resource which may have thousands of clients trying to use it and a bad one can affect all the others. This is why some DBAs don't allow unfettered SQL access to the data. A better way to do this would be to replace that slow query with two union-all'ed fast queries.
paxdiablo
@paxdiablo: I pointed out quite explicitly in the first paragraph of my answer that traffic-minimisation doesn't apply to the example provided by the question.
Marcelo Cantos
A: 

The fact that you can step through the code in your IDE more easily is really the only advantage to your post-processing solution. Doing the logic in the database server reduces the sizes of result sets, often drastically, which leads to less network traffic. It also allows the query optimizer to get a much better picture of what you really want done, again often allowing better performance.

Therefore I would nearly always recommend SQL logic. If you treat a database as a mere dumb store, it will return the favor by behaving dumb, and depending on the situation, that can absolutely kill your performance - if not today, possibly next year when things have taken off...

Kilian Foth
If you're going to need the same size result set in any case - one row for each salesperson, for example - would your answer change?
Nathan Long
IF you only post-process individual columns in result rows that you need to retrieve anyway, and IF this doesn't happen in a way that can affect the query execution plan, then I suppose it doesn't matter. I just think doing work higher up in the stack than necessary is a bad habit to get into; one day you might accidentally commit a really gross blunder. (On the other hand, if you really need to forge someone's given name consistently, why not store it like that in the database record in the first place? I suspect a better example would make things clearer here.)
Kilian Foth
+2  A: 

I'm a strong advocate of putting as much logic as possible directly into the database. That means incorporating it in views and stored procedures. I believe that most follows the DRY principle.

For example, consider a table with FirstName and LastName columns, and an application that frequently makes use of a FullName field. You have three choices:

  1. Query first and last name and compute the full name in application code.

  2. Query first, last, and (first || last) in your application's SQL whenever you query the table.

  3. Define a view CustomerExt that includes the first and last columns, and a computed full name column and then query against that view, rather than the customer table.

I believe option 3 is clearly correct. Consider the addition of a MiddleInitial field to the table and the full name computation. Using option 3, you simply need to replace the view and every application across your company will instantly use the new format for FullName. The view still makes the base columns available for those instances in which you need to do some special formatting, but for the standard instance everything works "automatically".

That's a simple case, but the principle is the same for more complex situations. Perform application- or company-wide data logic directly in the database and you do not need to concern yourself with keeping different applications up to date.

Larry Lustig
Option 4. Create a `Customer` class that contains the logic for rendering a full name given a first and last name, which can have multiple implementations as needed (for example, for different cultures). Use that class in all your applications.
Jeff Sternal
Option 4 only works within a single application, not across applications which may be written in different languages. And it requires the programmer to call the formatting code throughout the application. However, if you're using a full-blown ORM in which the calculations can be bound together with the data retrieval logic, and you're not concerned about cross-application logic (or you're certain that all applications for that DB will be written in the same language and using the ORM) then Option 4 is similarly valid.
Larry Lustig
@Larry - most modern development platforms allow you to use code across different applications, and many even allow you to use that code in applications written in different languages. Additionally, calling the formatting code is normally as simple as typing `customer.FullName`, which is presumably how you'd access it via a database as well.
Jeff Sternal
I can say only that my experience (I'm a freelancer) is generally different from yours — I'll often be in a situation in which there's Access code, two or three Delphi applications, some Python utilities, and third party ETL and reporting tools, all pointing at the same database. In a case like that, if company-wide data logic exists, the best place for it to live is in the database.
Larry Lustig
@Jeff: There is still the small problem of deployment. A large enterprise with hundreds of applications is either going to require an extraordinarily rigorous shake-out and deployment process, or create a "FullName" service that everyone else calls, which is far more complicated than just adding the logic to a view. Also consider that users may want to use FullName in a WHERE clause (e.g., "Find people with full names that won't fit in this invoice template."), which may be intractable if it only exists in the app.
Marcelo Cantos
@Larry: Good points, but I will say that ORM is far from being a solution to this problem, especially if you want to use computed fields in a WHERE clause.
Marcelo Cantos
A: 

I like to distinguish data vs business rules, and push the data rules into the stored procs as much as possible. There is not always a hard and fast distinction between the two, but in your example of calculating sales bonuses, the formula itself might be a business rule but the work of gathering and aggregating the various figures used in the formula is a data rule.

Sometimes, though, it depends on the deployment model and change control procedures. If the sales formula changes frequently and deployment of the business layer code is cumbersome, then tweaking just one function/stored proc in the database would be a great solution.

JeffSahol