tags:

views:

27

answers:

2

I have a table (2 million rows) in Informix v11.10, replicated (50+ node) environment

Basic layout is like so:
ID (PK) (int)
division (int)
company (int)
feature1 char(20)
feature2 int
...
feature 200 char(2)

There are several issues I have with the current layout: There are 200 "features" associated with this record but maybe 5-10 of them at any given time are not default/null (different for each record).

An update to all records for a company would sometimes mean updating 100k rows which chokes replication and isn't easy to manage.

So I made a change to a table like so:
ID (int)
ID_TYPE (ID,division, or company)
Feature_name
Feature_value

And had another table with only:
ID (int)
division (int)
company (int)

So for say ID #1 there would be 10 rows in the table, and the associated division might have a few records, and company might have a few. An ID record would "override" any record with the same feature_name that matches the division, and division would override any company.

I created a function that when you pass in an ID and a feature_name it queries based on company, then queries on division, and then based on ID, and returns the feature value based on the above override logic. (Basically an ordered foreach loop)

Then I created a view looking like:
select
my_func(feature1,ID) as feature1
my_func(feature2,ID) as feature2
...
my_func(feature200,ID) as feature200
from table

Now the issue is that I'm hitting the table 200 * 3(for ID, company, division) times for each feature which is just not going to work, it pegs the CPU. The new number of records is around 20 million and takes up much less space.

Any thoughts? I feel like I'm missing use of a temp table somewhere that would keep it from needing to hit the 20 million row table 600 times.

A: 

You shouldn't be hitting your table 200*3 for each feature, but for each row of your view - this is because your view includes 200 calls to my_func for each row (one per feature).

This begs the question: are you ever going to need to access all 200 features simultaneously? From what has been written in the question it sounds as though any given ID is likely to be using only a small subset of features - any queries that are specific to particular features should probably be accessing my_func directly (instead of via the view) for those features.

On the other hand, where it is essential to retrieve all 200 features, basing the view on 200 calls to my_func will guarantee 600 logical accesses per row retrieved. Instead, I suggest rewriting the view to access the feature table directly, grouping by ID and with each feature derived by a MAX(CASE WHEN... type structure. This would still result in 600 physical rows being read, but only a maximum of 3 logical reads, for each view row returned - I would expect this to perform significantly better.

Mark Bannister
Yes unfortunately due to many legacy programs hitting this table I will always need those 200 features visible.
So try rewriting the view to access the feature table directly (as suggested above), and compare the resultant performance.
Mark Bannister
A: 

my common sense tells me you should normalize into two separate tables.

Frank Computer