views:

318

answers:

14

Given that database is generally the least scalable component (of a web application), are there any situations where one would put logic in procedures/triggers over keeping it in his favorite programming language (ruby...) or her favorite web framework (...rails!).

+8  A: 

You're keeping the processing in the database, along with the data.

If you process on the server side, then you have to transfer the data out to a server process across the network, process it, and (optionally) send it back. You have the network bandwidth/latency issues, plus memory overheads.

To clarify - if I have 10m rows of data, my two extreme scenarios are to a) pull those 10m rows across the network and process on the server side, or b) process in place in the database using the server and language (SQL) optimised for this purpose. Note that this is a generalisation and not a hard-and-fast rule, but it's the one I follow for most scenarios.

Brian Agnew
I don't think this is usually the issue. When you're talking about scalability then you're talking about many clients. If you have decent bandwidth between your Web servers and database servers then I don't think that latency is an issue. With many clients you would be pipelining. The more time the DB server spends processing business logic, the less it spends servicing queries. Using this kind of pipelining and caching techniques etc, I think the typical paradigm usually wins. Otherwise you could just make the DBSvr the WebSvr and you get no bandwidth or latency issues. :)
BobbyShaftoe
`BobbyShaftoe`: passing data even between two `Oracle` processes (as in `EXTPROC`) is dozens times as slow as `PL/SQL`. Business logics does not usually involve complex `MMX4 SSH3` tensor operations, it all about moving large amounts of data hither and thither. Usually (not always, but most of the times) it may be done much faster within the `DBMS` engine.
Quassnoi
+3  A: 

When many heterogeneous applications and various other systems need to access your single database and be sure through their operations data stays consistent without integrity conflicts. So you put your logic into triggers and stored procedures that will offer an interface to external clients.

Developer Art
External clients shouldn't be connecting to your database directly in most circumstances. You should have an API.
BobbyShaftoe
`@BobbyShaftoe`: yes, and the stored procedures *are* the API. Nothing else can prevent a client from connecting to the database and issue any query they like.
Quassnoi
Sure, and that's fine. I prefer to have an API on top of that rather than having client applications calling stored procedures directly. I've already seen this movie with large Informix databases.
BobbyShaftoe
+3  A: 

Maybe not for most web-based systems, but certainly for enterprise databases. Stored procedures and the like allow you much greater control over security and performance, as well as offering a bit of encapsulation for the database itself. You can change the schema all you want as long as the stored procedure interface remains the same.

Cogwheel - Matthew Orlando
In a typical well written enterprise system you would have this encapsulation but in the form of an API. Now, it is somewhat defensible to use triggers purely for auditing. You could use stored procedures and still have an API based mechanism for all the other applications to use.
BobbyShaftoe
A: 

Triggers mean 3rd-party apps can modify the database without creating logical inconsistencies.

Kev
Revoking `DML` permissions on the tables and granting `EXECUTE` on procedures means the same. Without drawbacks of the triggers.
Quassnoi
It doesn't mean the same--without a trigger there's no guarantee a 3rd-party app will follow the business rules. Which drawbacks would you say are significant?
Kev
As long as the procedures follow the business rules and the apps are only allowed to call the procedures, there is guarantee. The drawbacks are numerous. Triggers have no parameters, they are not set-based (in `Oracle` and `MySQL`), they are extremely hard in handling recursion, they are expensive to call, their firing order is not guaranteed (again, in `Oracle`).
Quassnoi
Ah, okay, that's true about the guarantee. I don't see lack of parameters or recursion as drawbacks--that's when you'd call a regular procedure (if you really needed to) from a trigger. In PostgreSQL they can be set-based ("statement-level" they say), aren't that expensive, and have a set firing order. I think using both is more flexible and powerful than limiting yourself to one or the other.
Kev
+14  A: 
  • Server-side logic is often much faster, even with procedural approach.

  • You can fine-tune your grant options and hide the data you don't want to show

  • All queries in one places are more convenient than if they were scattered all around the code.

And here's a (very subjective) article in my blog on the reason I prefer stored procedures:

BTW, triggers (as opposed to functions / stored procedures / packages) I generally dislike.

They are completely other story.

Quassnoi
+1, in my book, triggers are a band-aids. design and develop forward looking database tables and procedures, and then only use triggers as a last resort when you have to.
KM
@KM Why are triggers a last resort?
Amuck
`@Amuck`: because this hook-based programming paradigma gets out of control fast. Triggers have no parameters, they are not set-based in `Oracle` and `MySQL` and have many other drawbacks.
Quassnoi
Triggers can be very useful in environments where Foreign Keys are not available, such as the MyISAM and NDB/Cluster storage engines in MySQL. They can also be useful for propagating changes throughout a series of tables which are related hierarchically where you want to avoid extensive joins to get a status from a great-great-great-grandparent table. Of course, in order to be useful and not just plain dangerous they must be tested extensively and, once proven, documented extensively, or it is very easy to lose track of them and forget they exist. Better than a purely programmatic solution.
Dereleased
They (triggers, sprocs) have their place, but one must be very careful as it can VERY easily and quickly turn into WORN (Write Once Read Never) code. I like to think of it similar to threading. Lots of people do it, but do it wrong. It takes a truly exceptional person to do it RIGHT.
Brian Knoblauch
I don't understand this fear of bad triggers.
Kev
A: 

If you do that, you are tying your business logic to your model. If you code all your business logic in T-SQL, you aren't going to have a lot of fun if later you need to use Oracle or what have you as your database server. Actually, I'm not sure I understand this question exactly. How do you think this would improve scalability? It really shouldn't.

BobbyShaftoe
On the contrary, the **data model** is exactly the place where **data invariants** need to be maintained. So either your data model is wholly in the database, and the database maintains invariants, or your data model is mixed between the database and the application-tier code, and the application-tier code maintains invariants. This is logically separate from business logic, which is how your application will interact with the data model.
Daniel Pryden
Bobby, people very rarely change database backends in large applications. This is only an issue for software that is intended to use differnt backends depending onthe customer.
HLGEM
Nonsense. It is not uncommon for someone to choose a database technology, reach a certain scale, realize it can not handle the level of transactions, and move to another technology. Either that, or the database system becomes obsoelete. There is a cottage industry of people migrating large scale applications which become a nightmare for the this very reason: everything was done in the database layer.
BobbyShaftoe
+2  A: 

If the database is shared, having logic in the database is better in order to control everything that happens. If it's not it might just make the system overly complicated.

marcgg
+2  A: 

If you have multiple applications that talk to your database, stored procedures and triggers can enforce correctness more pervasively. Accordingly, if correctness is more important than convenience, putting logic in the database is sensible.

Scalability may be a red herring, though. Sometimes it's easier to express the behavior you want in the domain layer of an OO language, but it can be actually more expensive than doing the idiomatic SQL way.

The security mechanism at a previous company was first built in the service layer, then pushed to the db side. The motivation was actually due to some limitations in a data access framework we were using. The solution turned out to be a bit buggy because our security model was complicated, but the upside was that bugs only had to be fixed in the database; we didn't have to worry about different clients following different rules.

JasonTrue
A: 

Personally, I'm really not a fan of triggers, particularly in a database dedicated to a single application. I hate trying to track down why some data is inconsistent, to find it's down to a poorly written trigger (and they can be tricky to get exactly correct).

Paddy
Data inconsistencies are far, far, far more likely to arise from application code written by people who don't understand the database than from triggers written as part of the database structure. It is for this reason (in part) that triggers were invented.
Larry Lustig
What makes triggers worse to work with than application logic? Logic in applications can be tricky to get exactly correct as well, so triggers aren't any different in that respect.
Amuck
Triggers are terrific for what they are needed for which is enfocing complex data integrity rules or auditing. That someopeople don't know how to write ordebug them is irrelevant. Some people don't know how to write C# code either, does that make it bad?
HLGEM
Possibly just my background, but most of the applications I have worked on have made sparse use of triggers, with just the odd one or two thrown in to confuse (as it's often the last place I think to look when data is incorrect).It's also, I think a 'religious' point of view. I prefer my business logic in my application code.
Paddy
+3  A: 

In (almost) every situation you would keep the processing that is part of the database in the database. Application code cannot substitute for triggers, you won't get very far before you have updated the database and failed to fire the application's equivalent of the triggers (the first time you use the DBMS's management console, for instance).

Let the database do the database work and let the application to the application's work. If you have a specific performance problem with the database, and that performance problem can be addressed by moving processing from the database, in that case you might want to consider doing so.

But worrying about database performance without a database performance problem existing (which is what you seem to be doing here) is both silly and, sadly, apparently a pre-occupation of many Stackoverlow posters.

Larry Lustig
I'm not sure I understand what you're arguing for exactly. Are you arguing to put all business logic in the dataabse and rely on triggers or what?
BobbyShaftoe
I'm arguing that any functionality that is part of the database definition (table X must be updated whenever a row in table Y is updated to meet condition Z, for instance) should be enforced declaratively inside the database engine.
Larry Lustig
Larry, this a very sad world you are describing here...
artemave
artnave, your comment doesn;t make sense. WHat Larry is describing is the best practice from a data integrity perspective. Putting these rules in the application only is dangerous and irresponsible. Databases are NOT just interacted with from an application!
HLGEM
I don't see anything sad about good design, or letting a product do what it was designed to do, and is good at. The idea of deliberately allowing the database to become invalid when you have tools to enforce validity is what seems sad to me — that and the idea that people would allow tool-evangelism to outweigh common sense.
Larry Lustig
`@Larry:` I would agree with everything you said if you replaced "triggers" with "stored procedures". Server-side logic is a good thing, trigger-based implementation is usually not.
Quassnoi
@Quassnoi — No worries. I see no philosophical difference between enforced use of stored procedures and the use of triggers notwithstanding that you may feel triggers have some implementation issues in certain products. My preference is to use declarative RI where possible. In my projects I don't use many triggers, but when I need them I really need them, and the idea of deferring that required processing to some future application programmer who's never seen a database before is what bugs me (and, I think, you too).
Larry Lustig
Larry, I admire your ability to perceive 'what I seem to be doing here' and how future programmer I am. Now, please go back and read your post carefully. It explains NOTHING. Only states THE WAY. Unlike other replies in this thread, your one is worthless. Sadly.
artemave
I don't think anyone is arguing to leave business logic up to the applications. What some might argue for is having a standard interface to the database by which other applications access it. I'm not sure that's a "sad" world.
BobbyShaftoe
The original question clearly suggests that stored procedures and triggers should be eliminated in favor of putting database logic into the application layer. That's a bad idea, as virtually every answer to the question makes clear. Should application access to the database be done through a standard abstraction layer? Certainly. But that layer should not include the kind of definitional logic that one normally finds in a trigger.
Larry Lustig
+3  A: 

Least scalable? SQL???

Look up, "federating."

inked
A: 

Security is another advantage of using stored procs. You do not have to set the security at the table level if you don't use dynamic code (Including ithe stored proc). This means your users cannot do anything unless they have a proc to to it. This is one way of reducing the possibility of fraud.

Further procs are easier to performance tune than most application code and even better, when one needs to change, that is all you have to put on production, not recomplie the whole application.

Data integrity must be maintained at the database level. That means constraints, defaults values, foreign keys, possibly triggers (if you have very complex rules or ones involving multiple tables). If you do not do this at the database level, you will eventually have integrity issues. Peolpe will write a quick fix for a problem and run the code in the query window and the required rules are missed creating a larger problem. A millino new records will have to be imported through an ETL program that doesn't access the application because going through the application code would take too long running one record at a time.

If you think you are building an application where scalibility will be an issue, you need to hire a database professional and follow his or her suggestions for design based on performance. Databases can scale to terrabytes of data but only if they are originally designed by someone is a specialist in this kind of thing. When you wait until the while application is runnning slower than dirt and you havea new large client coming on board, it is too late. Database design must consider performance from the beginning as it is very hard to redesign when you already have millions of records.

HLGEM
A: 

A good way to reduce scalability of your data tier is to interact with it on a procedural basis. (Fetch row..process... update a row, repeat)

This can be done within a stored procedure by use of cursors or within an application (fetch a row, process, update a row) .. The result (poor performance) is the same.

When people say they want to do processing in their application it sometimes implies a procedural interaction.

Sometimes its necessary to treat data procedurally however from my experience developers with limited database experience will tend to design systems in a way that do not leverage the strenght of the platform because they are not comfortable thinking in terms of set based solutions. This can lead to severe performance issues.

For example to add 1 to a count field of all rows in a table the following is all thats needed:

UPDATE table SET cnt = cnt + 1

The procedural treatment of the same is likely to be orders of magnitude slower in execution and developers can easily overlook concurrency issues that make their process inconsistant. For example this kind of code is inconsistant given the avaliable read isolation levels of many RDMBS platforms.

SELECT id,cnt FROM table
...
foreach row
...
UPDATE table SET cnt = row.cnt+1 WHERE id=row.id
...

I think just in terms of abstraction and ease of servicing a running environment utilizing stored procedures can be a useful tool.

Procedure plan cache and reduced number of network round trips in high latency environments can also have significant performance advantages.

It is also true that trying to be too clever or work very complex problems in the RDBMS's half-baked procedural language can easily become a recipe for disaster.

Einstein
A: 

"Given that database is generally the least scalable component (of a web application), are there any situations where one would put logic in procedures/triggers over keeping it in his favorite programming language (ruby...) or her favorite web framework (...rails!)."

What makes you think that "scalability" is the only relevant concern in a system design ? I agree with rexem where he commented that it is very obvious that you are "not" biased ...

Databases are sets of assertions of fact. Those sets become more valuable if they can also be guaranteed to conform to certain integrity rules. Those guarantees are not worth a dime if it is the applications that are expected to enforce such integrity. Triggers and sprocs are the only way SQL systems have to allow such guarantees to be offered by the DBMS itself.

That aspect outweighs "scalability" anytime, anywhere, anyhow.

Erwin Smout