views:

323

answers:

6

I've been using PostgreSQL a little bit lately, and one of the things that I think is cool is that you can use languages other than SQL for scripting functions and whatnot. But when is this actually useful?

For example, the documentation says that the main use for PL/Perl is that it's pretty good at text manipulation. But isn't that more of something that should be programmed into the application?

Secondly, is there any valid reason to use an untrusted language? It seems like making it so that any user can execute any operation would be a bad idea on a production system.

PS. Bonus points if someone can make PL/LOLCODE seem useful.

+1  A: 

These days, any "unique" or "cool" feature in a DBMS makes me incredibly nervous. I break out in a rash and have to stop work until the itching goes away.

I just hate to be locked in to a platform unnecessarily. Suppose you build a big chunk of your system in PL/Perl inside the database. Or in C# within SQL Server, or PL/SQL within Oracle, there are plenty of examples*.

Now you suddenly discover that your chosen platform doesn't scale. Or isn't fast enough. Or something. Worse, there's a new kid on the database block (something like MonetDB, CouchDB, Cache, say but much cooler) that would solve all your problems (even if your only problem, like mine, is having an uncool databse platform). And you can't switch to it without recoding half your application.

(*Admittedly, the paid-for products are to some extent seeking to lock you in by persuading you to use their unique features, which is not an accusation that can directly be levelled at the free providers, but the effect is the same).

So that's a rant on the first part of the question. Heart-felt, though.

is there any valid reason to use an untrusted language? It seems like making it so that any user can execute any operation would be a bad idea

My goodness, yes it does! A sort of "Perl injection attack"? Almost worth doing it just to see what happens, I'd have thought.

For philosophical reasons outlined above I think I'll pass on the PL/LOLCODE challenge. Although I was somewhat amazed to discover it was a link to something extant.

Mike Woodhouse
+3  A: 

"isn't that [text manipulation] more of something that should be programmed into the application?"

Usually, yes. The generally accepted "three-tier" application design for databases says that your logic should be in the middle tier, between the client and the database. However, sometimes you need some logic in a trigger or need to index on a function, requiring that some code be placed into the database. In that case all the usual "which language should I use?" questions come up.

If you only need a little logic, the most-portable language should probably be used (pl/pgSQL). If you need to do some serious programming though, you might be better off using a more expressive language (maybe pl/ruby). This will always be a judgment call.

"is there any valid reason to use an untrusted language?"

As above, yes. Again, putting direct file access (for example) into your middle tier is best when possible, but if you need to fire things off based on triggers (that might need access to data not available directly to your middle tier), then you need untrusted languages. It's not ideal, and should generally be avoided. And you definitely need to guard access to it.

Neall
+5  A: 

@Mike: this kind of thinking makes me nervous. I've heard to many times "this should be infinitely portable", but when the question is asked: do you actually foresee that there will be any porting? the answer is: no.

Sticking to the lowest common denominator can really hurt performance, as can the introduction of abstraction layers (ORM's, PHP PDO, etc). My opinion is:

  • Evaluate realistically if there is a need to support multiple RDBMS's. For example if you are writing an open source web application, chances are that you need to support MySQL and PostgreSQL at least (if not MSSQL and Oracle)
  • After the evaluation, make the most of the platform you decided upon

And BTW: you are mixing relational with non-relation databases (CouchDB is not a RDBMS comparable with Oracle for example), further exemplifying the point that the perceived need for portability is many times greatly overestimated.

Cd-MaN
+1  A: 

From my perspective, I guess the answer is 'it depends'.

There is an argument that manipulation of the data belongs in the database layer, so that the business logic does not need to be overly concerned about how the manipulation happens, it just knows that it has.

Another very good reason to process data on the db layer is if the volume of data being crunched means that network bandwidth will become an issue. I once had to categorise very large amounts of data. Processing this in the application layer was severly restricted by the time required to transfer all the data across the network for processing.

I then wrote a binning algorithm in PL/pgSQL and it worked much faster.

Regarding untrusted languages, I heard a podcast from Josh Berkus (a postgres advocate) who discussed an application of postgresql that brought in data from MySQL as part of its processing, so that the communication itself was handled by the postgres server. I don't remember the full details, I think it was on the FLOSS Weekly podcast which is quite an interesting discussion of the history of PostGRESQL and some of the issues it is put to.

kaybenleroll
A: 

I think most additional languages are offered so that if you develop in that language on a regular basis, you can feel comfortable writing db functions, triggers, etc. The usefulness of these features is to provide a control over data as close to the data as possible.

Thanatos
+1  A: 

The untrusted versions of the procedural languages allow you to access I/O on the system. This can come in handy if you need a trigger or something send a email or connect to a socket server to send a popup notification. There are tons of uses for this type of thing, and because of postgresql isolation levels you cans safely do things like this. You can put checkpoints in the function so if the transaction fails the email or whatever won't go out. The nice thing about doing this is it removes the logic from the client and puts it on the server.