views:

97

answers:

2

I'm a .NET programmer, and I'm developing a system that collects real time production data and stores it in an Oracle database. The data will need to undergo extensive processing: essentially digital signal processing stuff--looking for mins and maxes of various fields in a datastream and categorizing "chunks" of time accordingly. Apparently, it's stuff that we could conceivably (though not necessarily) use MatLab libraries to process.

What I need to know is where to locate the processing logic: inside the Oracle db, implemented with PL-SQL (or whatever is appropriate), or in an external .NET executable or service that could call MatLab libraries?

A related question: is it possible to call the MatLab library functions from within Oracle? This may provide the most concise answer.

Sorry for the block of text. Any help would be greatly appreciated.

+3  A: 

Ideally, you'd benchmark both and see what sort of performance you could get out of both options.

Personally, my bias would be that data processing happen in the database unless there is a compelling reason to do it elsewhere-- databases are designed to process large chunks of data quickly. Your case, though, is one where there may be a compelling reason to do it elsewhere because MatLab may well have superior libraries for digital signal processing. If you are just looking for minimums and maximums, it's probably unlikely that MatLab would be any more efficient than Oracle, certainly not enough to overcome the expense of moving a bunch of data out of the database to be processed and then moving the results back in. If you are doing more sophisticated processing-- if you need to do Fast Fourier Transforms (FFTs) and the like-- you're probably far better off using delivered libraries in MatLab rather than trying to build FFT logic in PL/SQL.

If your Oracle database runs on an operating system that MatLab will also run on, and MatLab can expose a C DLL-style API for its API's (if not, I'm sure you could put a thin .Net layer on top of MatLab to create a C DLL), then you can call MatLab functions as external procedures from Oracle.

Justin Cave
good answer in general, and +1 for mentioning Oracle extprocs, a lesser-known feature
dpbradley
Thanks for the reply! Between this answer and the one below, it's pretty clear that--like with most things--the best route depends a lot on the details of what we're trying to accomplish.We're not tied to MatLab, but I was hoping that the pre-made number-crunching functions might save some dev time. It may turn out that we can't even access the functions as extprocs, though, so it may be PL-SQL all the way.
Klay
If you want to scare yourself, Alberto Dell'Era won the NOCOUG SQL challenge with a solution which implements FFT using the MODEL clause :: http://www.adellera.it/investigations/nocoug_challenge/index.html
APC
A: 

We have a very similar system. We collect data from from manufacturing/analysis equipment, process the raw data with our algorithms, and store the results of our processing in a standard format in our Oracle database.

If you're using Matlab for your processing, it sounds like you're even more math intensive than we are, yet we still chose to manage all the data processing in our external apps. Our schema basically looks something like this:

CREATE TABLE RawData
  (
   RawDataId            NUMBER(22) DEFAULT Null NOT NULL, (PK)
   RawDataType          VARCHAR2(50) DEFAULT Null NOT NULL,
   ExposureToolId       NUMBER(22) DEFAULT Null NULL,
   EventTime            DATE DEFAULT Null NULL,
   ... ( various columns descibing event characteristics) ...
   Body                 LOB DEFAULT Null NOT NULL,
   Length               NUMBER DEFAULT 0 NULL
  )

CREATE TABLE Analysis
  (
   AnalysisId       NUMBER(22) DEFAULT Null NOT NULL, (PK)
   RawDataId        NUMBER(22) DEFAULT Null NOT NULL,
   AnalysisName     VARCHAR2(255) DEFAULT Null NOT NULL,
   AnalysisType     VARCHAR2(255) DEFAULT Null NOT NULL,
   AnalyzeTime      DATE DEFAULT Null NULL,
   Status           CHAR(1) DEFAULT Null NULL,
   SentToServerDate DATE DEFAULT Null NULL,
   ServerId         NUMBER(22) DEFAULT Null NULL
  )

CREATE TABLE DataSet
  (
   DataSetId        NUMBER(22) DEFAULT Null NOT NULL, (PK)
   AnalysisId       NUMBER(22) DEFAULT Null NOT NULL,
   DataSetName      DATE DEFAULT sysdate NULL
  )

CREATE TABLE Coefficient
  (
   DataSetId            NUMBER(22) DEFAULT Null NOT NULL, (PK)
   IdentifierId         NUMBER(22) DEFAULT Null NOT NULL, (PK)
   Coefficient          FLOAT DEFAULT 0 NULL
  )

Collected data from the tools goes into the RawData table, and at the time a header record(s) are created in the Analysis table, indicating that the data needs processing, showing the name of the analysis and the server that will do the processing; the status is set to "Pending". The server is then sent the message to perform the work, and when it stores the results (in the Coefficient table, grouped by Datasets), the status is set to "Done" (or "Error" if we encountered a problem). When the server components start up, they query the Analysis table for any pending work; in this way we ensure that there is no lost work in the event of crashes.

This has been our model for over ten years, and it works really well. It's easy to deploy and maintain different versions of the server modules; in my opinion, it's easier to manage executables this way than it is to compile packages on the Oracle server, but that could just be my bias - the conversion of raw to analyzed data just seems more "app-centric" than a database task.

Steve Broberg
This is the proposed basic architecture we've got too. Data is generated by sensors, sent via web service to a raw data table, then processed to clean up data, remove spikes and outliers, cut the timeline into discrete phases, and re-write data to production tables. Part of the problem is that we haven't figured out exactly how to do the clean-up and processing yet. As an app programmer, I share your bias, but I'm trying to be objective and look for the best solution. Thanks for your help!
Klay