views:

1147

answers:

12

I'm starting work on a program which is perhaps most naturally described as a batch of calculations on database tables, and will be executed once a month. All input is in Oracle database tables, and all output will be to Oracle database tables. The program should stay maintainable for many years to come.

It seems straight-forward to implement this as a series of stored procedures, each performing a sensible transformation, for example distributing costs among departments according to some business rules. I can then write unit tests to check if the output of each transformation is as I expected.

Is it a bad idea to do this all in PL/SQL? Would you rather do heavy batch calculations in a typical object oriented programming language, such as C#? Isn't it more expressive to use a database centric programming language such as PL/SQL?

+4  A: 

Normally I say put as little in PL/SQL as possible - it is typically a lot less maintainable - at one of my last jobs I really saw how messy and hard to work with it could get.

However, since it is batch processing - and since the input and output are both the DB - it makes good sense to put the logic into PL/SQL - to minimize "moving parts". However, if it were business logic - or components used by other pieces of your system - I would say don't do it..

+2  A: 

As long as the calculations you need to perform can be adequately AND readably captured in PL/SQL, then using only PL/SQL would make the most sense.

The real catch is maintainability -- it's very easy to write unmaintainable SQL, if only because every RDBMS has a different syntax and different function set once you step outside of simple SQL DML, and no real standards for formatting. commenting, etc.

Craig Trader
A: 

It's not usually more expressive because most stored procedure languages suck by design. But it will probably run faster than in an external app.

I guess it boils down to how familiar you are with PL/SQL, how much time you have to write this, how important is performance and if you can reasonably expect maintainers to be familiar enough with PL/SQL to maintain a big program written in it.

If speed is not relevant and maintainers will probably be not PL/SQL proficient, you might be better using a 'traditional' language.

You could also use a hybrid approach, where you use PL/SQL to generate intermediate data (say, table joins and sums or whatever) and a separate application to control flow and check values and errors.

Vinko Vrsalovic
+6  A: 

No, it isn't necessarily a bad idea. If the solution seems straightforward to you and allows you to test and verify each process, its sounds like it could be a good idea. OO platforms can be (though they don't have to be) bad for large data sets, as object creation and overhead can kill performance.

Oracle designed PL/SQL with problems like yours in mind, if there is sufficient corporate knowledge of the database and PL/SQL this seems like a reasonable solution. Keep large batch sets in mind, as each call from PL/SQL to the actual SQL engine is a context switch, so single record processes should be batched together where possible to improve performance.

Joe Skora
+1  A: 

Implementing it as a series of stored procedures should be fine. You then have freedom to call the stored procedures outside of the databases if necessary. Beware of having dependencies on particular servers. PL/SQL is not ideal for all solutions, so if a particular part of your transformation would be better written in a procedural language, you can create something that will do the trick. I good rule of thumb is that if it is you understand exactly what needs to happen, but cannot do it because of language constraints and you know it is easy in another language then you may be needing to move that processing out of PL/SQL.

Nat
A: 

This is a loaded question :) There's a couple of database programming architecture designs you should know of, and what their costs/benefits are. 2 Tier generally means you have a client connecting to a DB, issuing direct SQL calls. 3 Tier generally means you have an "application server" that is issuing direct SQL calls to the DB, but the client is talking to the app server. Generally, this affords "scaling out". Finally, you have 2 1/2 tiered apps that employ a 2 Tier like format, only the work is compartmentalized within stored procedures.

Your process sounds like a "back office" kind of thing, and clients/processes just need results that are being aggregated and cached on a once a month basis. That is, there is no agent that connects, and connects often, and says "do these calculations". Instead you allude to a process that happens once in a while, and you can get away with non-real time.

Therefore, given those requirements, I'd say that generally, it will be faster to be closer to the data, and let SQL server do all the calculations. I think you'll find that proximity to the data will serve you well.

However, in performing these calculations, you may find that some calculations are not amenable to SQL Servers. Take for example calculating the accrued interest of a bond, or any fixed income instrument. Not very pretty in SQL, and much more suited for a richer programming language. However, if you just have simple averages and other relatively sane aggregates, I'd stick to stored procedures, on the SQL side.

So again, there's not enough information as to the nature of your calculations, or what your house mandates in terms of SQL capabilities of devs for support, or what your boss says...but since I know my way around SQL, and like to stay close to the data, I'd stay pure SQL/Stored Procedures for a task like this.

YMMV :)

+2  A: 

I've created batch programs using C# and SQL.

Pros of C#:

You've got the full library of .NET and all the power of an OO language.

Cons of C#:

*Batch program and db separate - this means, you'll have to manage your batch program separate from the database.

*You need to escape all that dang sql code.

Pros of SQL:

*Integrates nicely with the DBMS. If this job only manipulates the database, it would make sense to include it with the database. You end up with a single db and all of its components in one package.

*No need to escape sql code

*keeping it real - you are programming in your problem domain

Cons of SQL:

Its SQL and I personally just don't know it as well as SQL.

In general, I would stick with using SQL because of the Pros outlined above.

Mashed Potato
+3  A: 

I wrote a huge amount of batch processing and report generation programs in both PL/SQL and Pro*C for one project. They generally preferred I write in PL/SQL as their own developers who would maintain in the future found that easier to understand than Pro*C code.

It ended up being only the really funky processing or reports that ended up being written in Pro*C.

It is not necessary to write these as stored procedures as other people have alluded to, they can be just script files that are run as necessary, kind of like a shell script. Make source code revision control and migration between test and production systems a heck of a lot easier, too.

Evan
+4  A: 

Just make sure you somehow log what is happening while it's working. Otherwise you'll have a black box and if it gets stuck somewhere for hours, you'll be wondering whether to stop it or let it work 'a little bit more'.

Milan Babuškov
There is a "longops" API provided by Oracle for this. It's in dbms_utility I think.
WW
The Oracle api for updating progress is called DBMS_application_info. Progress is published to the view v$session_longops.There is no reason why some other approach shouldn't be used.
mathewbutler
I believe it is dbms_application_info.set_module and .set_action you want. longops is used by the database to indicate things like how many blocks it has left to read for a particular step in a query plan
kurosch
+7  A: 

Something for other commenters to note - the question is about PL/SQL, not about SQL. Some of the answers have obviously been about SQL, not PL/SQL. PL/SQL is a fully functional database language, and it's mature as well. There are some shortcomings, but for the type of thing the poster wants to do, it's very good.

Mike McAllister
+3  A: 

PL/SQL is a mature language that integrates well with SQL. With each version of Oracle it becomes more and more powerful. Also starting from Oracle 11, PL/SQL compiles to machine code by default.

Sergey Stadnik
+6  A: 

You describe the following requirements

a) Must be able to implement Batch Processing b) Result must be maintainable

My Response:

  1. PL/SQL was designed to achieve just what you describe. It's also important to note that there are efficiencies in PL/SQL that are not available in other tools. An stored procedure language put the processing next to the data - which is where batch processing ought to sit.
  2. It easy enough to write poorly maintainable code in any language.

Having said the above, your implementation will depend on the available skills, a proper design and adherence to good quality processes.

To be efficient your implementation must process data in batches ( select in batches and insert/update in batches ). The danger with an OO approach is that it is easy to be led towards a design that processes data row by row. This type of approach contains unnecessary overhead, and will be significantly less efficient than a design that processes data in batches of rows.

It is possible to use both approaches successfully.

Mathew Butler

mathewbutler
regarding #2 - agree in principle, however some languages make it far easier to write sloppy, nasty, unmaintainable code... pl-sql is one of them... although the award probably goes to javascript.