views:

154

answers:

8

I'm using Rails and it does not provide support for database specific actions like triggers, stored procedures and various constraints (not all of them).

I wonder if I should put the database logic in the application itself instead.

Because then you can make more complex logic than the database provides and it is also database independent (I can move from mysql to postgresql and vice versa), and it won't be if you are putting these things in the database.

Is this the right way to go?

Thanks

+1  A: 

Generally, yes, mostly for the reasons you just mentioned.

Triggers are especially bad because it's pretty opaque to the developer about what's actually going on behind the scenes when an operation is performed. I tend to limit my use of triggers to auditing functions, or functions that are incredibly important to protecting the data integrity of the system. I almost entirely avoid using triggers for application logic.

Same goes with stored procedures. In this day and age with ORMs being so widespread, I use stored procs for big, set-based data modifications whose performance would be undesirably bad when using an ORM such as Rails, which requires you to read the data from the database and marshal it back to the database layer once you're done processing it.

Dave Markle
+10  A: 

(Note: This is a Postgres centric post. I have several years experience with MySQL, and a few Postgres and Oracle. I prefer Postgres by a country mile, but that's not the point of this post.)

This is really a question about two schools of thought: should the database be a simply a data store or should it contain application logic? There are cases for both.

IMHO the answer used to be a lot simpler before several DBs (esp Postgres) got some really, really good procedural languages in the DB itself. Before that point, it made sense to kick all the all the logic you could in the application because it had to be rather kludgey to get some things done in basic SQL.

Dave Markle in another answer makes a good point about triggers, they tend to be "magic" to the developer. Changing input on the way in can be terribly confusing. If I say UPDATE foo set X=3; but then I go check foo and x is really 4 because some trigger intercepted it, it can be confusing. Like Dave, I tend to promote triggers for auditing functions and the like. The other issue with triggers is performance, they can just suck the life right out of a good batch operation.

Stored procedures - I hold in much higher regard. The developer is explicitly invoking them and they're named, so he should not be seeing them as a black box. A good stored procedure can greatly increase the speed by reducing the number of rows that need to be "sent over the wire". For a DB such as Postgres with a strong set of PL langauges, there's really nothing that can't be in an SP (that doesnt mean it SHOULD be done, but can be). Take a look at SimplyCity for an example of how stored procedures can be your friend. Additionally, you could share application logic classes between your application code and PL/python, PL/Perl, or PL/Php, PL/Ruby, or PL/Java. I believe Oracle can do something similar with Java - it's just not something the company I currently do Oracle work for is working with.

If you plan on staying database agnostic, you're going to be sacrificing a lot of features, a lot of speed, and a lot of your time. ORMs can make this easier, but ultimately there is fundamental difference in most databases engines that just can't be fully abstracted away.

Overall, you need to test, test, test (with real data) and make the decision that's best for your app. It's often a balancing act between performance, future maintenance, cost, and the resources you have to work with. Often times (not every time), moving the logic out of the database and into the application greatly reduces the performance of the application.

Even if you don't decide to put application logic in the DB - take the time to really learn about your DB of choice. It will make you a far better application developer in the long run.

rfusca
Oracle supports Java stored procedures, but no-one uses them because everything in the Oracle world is geared up towards PL/SQL.
John Topley
+1 for "take the time to really learn about your DB". It rubs me the wrong way when folks don't learn the technology they choose.
Travis Heseman
+4  A: 

If you're already using Ruby on Rails then you've committed to using opinionated software. Rightly or wrongly, Rails' opinion on this matter is that the logic goes within the Rails application. It even takes it as far as enforcing referential integrity itself through ActiveRecord and its associations, although you can of course enforce this in the database if you wish by adding constraints to your tables. A lot of Rails developers don't bother, either through ignorance or choice.

  • The productivity gains Rails provides tend to diminish when you start to stray from its opinions and conventions
  • If you're writing the sort of application that manipulates large quantities of data and would benefit from taking advantage of specific RDBMS features, then this is a bad fit for Rails anyway because it's optimized for creating new CRUD web applications
John Topley
I would say, if you are using MySQL you are committing twice to the same school of though. I'd rather use a full-features DBMS, a more flexible framework/ORM (with Data Mapper for instance), and move the line whenever I want. But it also depends on the kind of application.
Marco Mariani
That's right. With Rails, you've committed to logic in the app. And this is where the productive gains come in, because you don't spend any development time figuring out and evaluating the different ways "this could be implemented" or "that could be implemented". Rails says "do it this way", "stay on the predefined RAILS" and get the darn thing out the door... start adding business value and stop getting sidetracked (pun intended) on these shenanigans.
Travis Heseman
+2  A: 

Because then you can make more complex logic than the database provides and it is also database independent (I can move from mysql to postgresql and vice versa), and it won't be if you are putting these things in the database.

The logic depends on the database - MySQL for example doesn't have recursive query support, nor analytic functions. PostgreSQL only got analytics recently - SQL Server's had both since v2005; Oracle since 9i.

Database logic in the application ltypically means more trips between the application and database - this is time/performance you will never recoup. Database trips are expensive, and be kept to as few as possible - which is why stored procedures & functions are your best tools for a highly scalable, performant persistence layer.

ORM has long been known for being great for queries that aren't complex, and I'm glad to see they've evolved to support using stored procedures/etc. ...Which utterly defeats the purpose of using ORM - now you're back to writing database specific code for sake of performance. It's like the joke about solving something with a regex - now you've got two problems...

Finally, databases - the tables and data types - also do not port to other vendors thought there are conversion kits. Even changes within the database are a pain, which is why database design/modeling development does not work well in agile/interative development processes.

Conclusion

Database logic in the application is great if you have super simple applications, with small datasets. But it will pay off in spades to learn database development - your clients will benefit from a more scalable, faster application.

OMG Ponies
+3  A: 

I'm solidly in the "application logic" camp myself.

For me the most important issue has to do with scaling the persistence layer as your application grows. By relying heavily upon the database to implement your logic, you're signing your application up for database sharding or replication as your scaling solution for your persistence layer as you grow. I've found much more success with the plethora of distributed cache solutions at the business object layer.

Of secondary performance is the transparency of implementation. Modern object oriented design has a lot of benefits in pulling your data and business logic together, but it largely decays into a mess once you're relying upon an external and hidden layer of logic.

Chris Alef
+4  A: 

I will give an answer probably the opposite of what everyone else will say.

Database logic belongs as close to the data as you can possibly implement it, and that means in the database. Doing anything else will, at best, require you to repeat yourself in different applications and have a big update headache. At worst (and this is pretty likely) you'll have inconsistent database logic implemented in different applications.

You need to separate out what's "database logic" and what's "application logic". My normal example is defining what your organization means by "FullName" for a customer. Imagine that you write code in your application that combines the first and last names into a full name field. Later, you add a middle initial (or decide to include an already existing middle initial column) as part of the full name.

If your full name logic is implemented in a view or stored procedure, you need only make a single change to enforce the new definition across all applications (including third-party applications for which you don't have source code).

The problem is somewhat eased in situations (like Ruby) where all data access is done through a shared ORM layer where you can put logic that will be used by all applications written in the same language. In my situation, however, I rarely work in situations in which only a single language or product will be used for programming against the corporate database.

Larry Lustig
For my day job I work on a reasonably large corporate J2EE/Oracle application that processes a large amount of data and we follow your advice and put everything as close to the data as possible. However, I think that this is atypical for Rails applications, most of which probably don't process millions of records and are standalone in terms of having their own database.
John Topley
I don't know if there's a "typical" situation for any development environment. I've used Django to add a smallish web interface onto a project that used a variety of other environments to access the database. But, yes, if one is SURE that only the one ORM will be used to access the database my arguments don't apply (although there are also performance arguments to be made).
Larry Lustig
+1 for "In my situation, however, I rarely work in situations in which only a single language or product will be used for programming against the corporate database." This is why i generally like an API like stored procedure structure inside the DB.
rfusca
+1  A: 

It's my opinion RDBMSs should be dumb and flexible, used strictly for persistence, and disparate enterprise applications should communicate through techniques of SOA.

I create my data structures in a normalized, flexible manner, typically using only datatypes and referential integrity constraints. The data are managed by an application (a "mother") that implements any rules necessary for a cohesive domain. If disparate (or 3rd party) apps need access to these data, they should not undermine the mother and go directly to the data store, but use techniques of SOA to communicate with her, asking her to to take actions and responding to the results (including failures) that she returns to them.

I think about it this way. If I want to access my company's email data, I don't access them directly where they reside on disk. That would be painful as I'd have to understand the data structures before I could do anything. Instead, I communicate with my email server application (Exchange Server, in my case) through techniques of SOA (WebDAV, CDO, LDAP, etc.). This app provides an abstraction layer to query the email, calendar, or contact data, to send emails or create contacts or appointments, or manage calendars etc.

If these other apps (created by you) are not fundamentally different from the mother app (also created by you), then perhaps you have too many one-off applications and a composition of these applications should be developed as a common and intuitive enterprise system. Then each of these applications turned use-cases (or modules, as I call them) can access the data directly leveraging a common data access layer that implements the same rules of cohesion defined centrally in the composite application. Of course, then the composite application should provide access to the data through techniques of SOA so that apps that are truly disparate can communicate with her.

Travis Heseman
And RDBMS can't be dump, otherwise it wouldn't be an RDBMS. It has to know everything about your data to be fast and reliable. It does't have to know anything about your application, that's a different thing.
Frank Heikens
I agree that your RDBMS needn't know anything about the application. However, an RDBMS can be dumb; it does not need to know "everything" about your data to be fast and reliable; as a matter of fact, I could argue that it needs to know only two things about each piece of data to meet that end - how to find it quickly and how to relate it to others (or indices and referential integrity constraints in RDBMS lingo).
Travis Heseman
For example, it doesn't need to know that my "special price" field should only store values between 0.00 and 100.00 in order to be fast and reliable; it just needs to know to index it in a tree based on numerical comparison and that it's associated to this or that product, and I'm sure it will be fast and reliable regardless of the numbers placed in that field. And if my app is solely responsible for the validity of that data, then it will ensure the restriction is met; at least until the business changes it.
Travis Heseman
+1  A: 

There is data logic and there is application logic. The database has to know everything about the data logic, the application about the application logic. Data logic can be stored inside the database in stored procedures, views, rules, etc. Oracle, PostgreSQL, SQL Server, etc. have very powerfull tools to support this.

Many applications can use the same database and the same data, and you have to be sure the data is correct. Every application (in any language) can do different things with this data, that's up to the application. Building database logic inside your application, is like reinventing the wheel and asking for trouble. RDBMSs have about 40 years of development history, you're not going to get a better result in a couple of weeks or months.

For PostgreSQL, take a look at the different PLs: PL/pgSQL is nice, PL/perl and PL/ruby might be more interesting for perl and ruby developers.

Frank Heikens
Eh, adding languages to postgres is so easy somebody wrote a pl/lolcode just because.
Marco Mariani