views:

613

answers:

4

Are staging tables an anti-pattern that is used when rpc (such as Java RMI or some kind of Web Service call) or messaging queue (such as JMS) would be a better solution, or are there problems better served by staging tables?

To clarify:

By staging tables I mean those cases where records are appended to a table or tables by a process which is then read by and acted on by second process or processes. I am not referring to tables which tables which are meant to reflect end of interval status (end of day, end of pay period etc). In most cases, the schema of the staging tables closely mimics an application data type(s) such as customer or account.

Potential causes for this anti-pattern:

1) Business Unit Wall between owners of the two processes prevents process that writes to or reads from staging being modified.

2) Low confidence in process that writes to or reads from staging leads developers to use table to prevent data loss "in case something fails"

3) Lack of knowledge or DGAS (don't give a ^%$@) attitude

+3  A: 

Staging tables, as you describe are an essential part of most data warehouse or BI environments. You could argue that reliable/resilient rpc would do the same job, but I think you'd be incorrect.

By pulling data to a staging table, you're moving it out of the production environment, potentially to do further calculation, summary, re-index, re-keying and so on, the majority of these are acheived 'in database'. Replacing this with an RPC you're moving the code and CPU cycles out of the DB and into an app server for no real benefit. For instance an app server has a much higher chance of crashing - you can't (easily) rollback an RPC.

Of course there are many ways of moving data reliably between systems, staging tables just happen to be one of the easiest, most performant, reliable and in development terms cheapest, doesn't always mean they're the right approach - but more often than not.

MrTelly
A: 

The only real time I have seen this is for reporting reason when denormalised tables are used to hold data while a report is generated. I don't think it is a problem for that use.

Craig
A: 

My first response is yes, but its mostly just because of my situation - yours may be different. We have a system where some relatively time sensitive information needs to go from a command component to a receiver component. The command information is put into a database table and then the receiver polls the table for updates. This is horrible. They did it so there would be a record of the commands in the database, but it ends up just making the actual commanding take forever and the decoupling sometimes causes the receiver to be out of sync with the database.

I'd rather see an EMS (like JMS) broadcast the message to a topic that both the receiver and a database inserter listen to, or a queue from commander to receiver, and then the receiver notify a status listener to put its status in the database.

I can't wait to fix that code.

John Ellinwood
+2  A: 

Why would they be an anti-pattern? Staging tables are incredibly useful for decoupling a receiving service from a processing service. When two such services are decoupled you are much more resilient to processing errors and network errors as all messages are stored in the staging table.

Andrew Hare