tags:

views:

155

answers:

3

I have a table that contains a history of costs by location. These are updated on a monthly basis. For example

Location1, $500, 01-JAN-2009
Location1, $650, 01-FEB-2009
Location1, $2000, 01-APR-2009

if I query for March 1, I want to return the value for Feb 1, since March 1 does not exist. I've written a query using an oracle analytic, but that takes too much time (it would be fine for a report, but we are using this to allow the user to see the data visually through the front and and switch dates, requerying takes too long as the table is something like 1 million rows). So, the next thought I had was to simply update the table with the missing data. In the case above, I'd simply add in a record identical to 01-FEB-2009 except set the date to 01-MAR-2009.

I was wondering if you all had thoughts on how to best do this. My plan had been to simply create a cursor for a location, fetch the first record, then fetch the next, and if the next record was not for the next month, insert a record for the missing month.

A little more information:

CREATE TABLE MAXIMO.FCIHIST_BY_MONTH
(
  LOCATION     VARCHAR2(8 BYTE),
  PARKALPHA    VARCHAR2(4 BYTE),
  LO2          VARCHAR2(6 BYTE),
  FLO3         VARCHAR2(1 BYTE),
  REGION       VARCHAR2(4 BYTE),
  AVG_DEFCOST  NUMBER,
  AVG_CRV      NUMBER,
  FCIDATE      DATE
)

And then the query I'm using (the system will pass in the date and the parkalpha). The table is approx 1 million rows, and, again, while it takes a reasonable amount of time for a report, it takes way too long for an interactive display

select location, avg_defcost, avg_crv, fcimonth, fciyear,fcidate from
(select location, avg_defcost, avg_crv, fcimonth, fciyear, fcidate,
max(fcidate) over (partition by location) my_max_date 
from FCIHIST_BY_MONTH 
where fcidate <='01-DEC-2008'
and parkalpha='SAAN'
)
where fcidate=my_max_date;
A: 

Filling in the missing data will (if you are careful) make the queries much simpler and run faster. I would also add a flag to the table to indicate that the data is missing data filled in so that if you need to remove it (or create a view without it) later you can.

I have filled in missing data and also filled in dummy data so that outer join were not necessary so as to improve query performance a number of times. It is not "clean" and "perfect" but I follow Leflar's #1 Law, "always go with what works."

You can create a job in Oracle that will automatically run at off-peak times to fill in the missing data. Take a look at: This question on stackoverflow about creating jobs.

Philip Schlump
A: 

The best way to do this is to create a PL/SQL stored procedure that works backwards from the present and runs queries that fail to return data. Each month that it fails to return data it inserts a row for the missing data.

create or replace PROCEDURE fill_in_missing_data IS
  cursor have_data_on_date is
    select locaiton, trunc(date_filed) have_date
    from the_table
    group by location, trunc(date_field)
    order by desc 1
  ;  
  a_date date;
  day_offset number;
  n_days_to_insert number;
BEGIN
   a_date := trunc(sysdate);
   for r1 in fill_in_missing_data loop
     if r1.have_date < a_date then
       -- insert dates in a loop
       n_days_to_insert := a_date - r1.have_date; -- Might be off by 1, need to test.
       for day_offset in 1 .. n_days_to_insert loop
         -- insert missing day
         insert into the_table ( location, the_date, amount ) 
            values ( r1.location, a_date-day_offset, 0 );
       end loop;
     end if; 
     a_date := r1.have_date;
     -- this is a little tricky - I am going to test this and update it in a few minutes
   end loop;
END;
Philip Schlump
A: 

What is your precise use case underlying this request?

In every system I have worked on, if there is supposed to be a record for MARCH and there isn't a record for MARCH the users would like to know that fact. Apart from anything they might want to investigate why the MARCH record is missing.

Now if this is basically a performance issue then you ought to tune the query. Or if it presentation issue - you want to generate a matrix of twelve rows and that is difficult if a doesn't have a record for some reason - then that is a different matter, with a variety of possible solutions.

But seriously, I think it is a bad practice for the database to invent replacements for missing records.

edit

I see from your recent comment on your question that is did turn out to be a performance issue - indexes fixed the problem. So I feel vindicated.

APC
I agree - unless you are certain that it is the right thing to do, to fill in the missing data, probably should not do it.
Philip Schlump
this data is generated by another application and really wasn't intended for this use. Its very reasonable for there to be 'missing' records, just problematic for this new use.
moleboy