views:

512

answers:

4

All,

Looking for some guidance on an Oracle design decision I am currently trying to evaluate:

The problem

I have data in three separate schemas on the same oracle db server. I am looking to build an application that will show data from all three schemas, however the data that is shown will be based on real time sorting and prioritisation rules that is applied to the data globally (i.e.: based on the priority weightings applied I may pull back data from any one of the three schemas).

Tentative Solution

Create a VIEW in the DB which maintains logical links to the relevant columns in the three schemas, write a stored procedure which accepts parameterised priority weightings. The application subsequently calls the stored procedure to select the ‘prioritised’ row from the view and then queries the associated schema directly for additional data based on the row returned.

I have concerns over performance where the data is being sorted/ prioritised upon each query being performed but cannot see a way around this as the prioritisation rules will change often. We are talking of data sets in the region of 2-3 million rows per schema.

Does anyone have alternative suggestions on how to provide an aggregated and sorted view over the data?

+1  A: 

Querying from multiple schemas (or even multiple databases) is not really a big deal, even inside the same query. Just prepend the table name with the schema you are interested in, as in

SELECT SOMETHING
FROM
  SCHEMA1.SOME_TABLE ST1, SCHEMA2.SOME_TABLE ST2
WHERE ST1.PK_FIELD = ST2.PK_FIELD

If performance becomes a problem, then that is a big topic... optimal query plans, indexes, and your method of database connection can all come into play. One thing that comes to mind is that if it does not have to be realtime, then you could use materialized views (aka "snapshots") to cache the data in a single place. Then you could query that with reasonable performance.

Just set the snapshots to refresh at an interval appropriate to your needs.

JosephStyons
Thank-you for your responses. The pointer to materialised views is quite interesting. Performance is the main concern, if I am pulling back 1 million rows from each schema and then performing a sort on multiple columns then the query time will be significant – especially with around a 100 concurrent user community. You have certainly given me some food for thought.
A: 

http://www.orafaq.com/node/1922

Theo
A: 

It doesn't matter that the data is from 3 schemas, really. What's important to know is how frequently the data will change, how often the criteria will change, and how frequently it will be queried.

If there is a finite set of criteria (that is, the data will be viewed in a limited number of ways) which only change every few days and it will be queried like crazy, you should probably look at materialized views.

If the criteria is nearly infinite, then there's no point making materialized views since they won't likely be reused. The same holds true if the criteria itself changes extremely frequently, the data in a materialized view wouldn't help in this case either.

The other question that's unanswered is how often the source data is updated, and how important is it to have the newest information. Frequently updated source day can either mean a materialized view will get "stale" for some duration or you may be spending a lot of time refreshing the materialized views unnecessarily to keep the data "fresh".

Honestly, 2-3 million records isn't a lot for Oracle anymore, given sufficient hardware. I would probably benchmark simple dynamic queries first before attempting fancy (materialized) view.

Adam Hawkes
A: 

As others have said, querying a couple of million rows in Oracle is not really a problem, but then that depends on how often you are doing it - every tenth of a second may cause some load on the db server!

Without more details of your business requirements and a good model of your data its always difficult to provide good performance ideas. It usually comes down to coming up with a theory, then trying it against your database and accessing if it is "fast enough".

It may also be worth you taking a step back and asking yourself how accurate the results need to be. Does the business really need exact values for this query or are good estimates acceptable

Tom Kyte (of Ask Tom fame) always has some interesting ideas (and actual facts) in these areas. This article describes generating a proper dynamic search query - but Tom points out that when you query Google it never tries to get the exact number of hits for a query - it gives you a guess. If you can apply a good estimate then you can really improve query performance times

Chris Gill