tags:

views:

206

answers:

2

Hi,

(PostreSQL 8.2, running on WindowsXP)

I have these lots of complexes queries that takes several seconds to run each. They are not "views" actually, but can be treated as so.

I decided to persist the result records from those "views" into tables, that I call "aux" (auxiliary) tables.

I can guarantee that there is no data change after calculating "aux" tables.

Let's take one example:

Supose I have a Query "X", so I persist it's results in "Table_X". The record set is something like this:

PERSON*     FIELD_A*     FIELD_ B         FIELD_C
=======================================================
1           10           Value1           Value2
1           20           Value3           Value4
1           30           Value5           Value6
------------------------------------------------------
2           10           Value1           Value2
2           20           Value3           Value4
------------------------------------------------------
3           20           Value3           Value4
3           30           Value5           Value6
------------------------------------------------------
etc..

(*)Primary key is: person, field_a

As you can see, each "person" has his subset of records in this table.

So, I can fetch quickly his records just with "select * from table_x where person = <person>".

I will always fetch ONLY by <person>, and all my queries has the same "face": "PERSON" + Some_Fields.

IMPORTANT: All "aux" tables can be readed (obviously, with "old" data until I commit) by others transactions while I'm "re-populating" them. But I can guarantee that they are never updated by those transactions.

My current process is:

- START TRANSACTION;
  - DO A LOTS OF OPERATIONS ON DATABASE. INSERT / UPDATE / DELETE ON SEVERAL TABLES.
  - AFTER THAT, I WILL CALCULATE "AUX" TABLES
  - LOOP THROUGH ALL MY "QUERIES": (WHERE HERE WE CAN CALL AS "X")
    - LOOP TROUGHT ALL "PERSON": (WHERE HERE WE CAN CALL AS <person>)
      - DELETE FROM <TABLE_X> WHERE PERSON = <person>; 
      - INSERT INTO <TABLE_X> (PERSON, FIELD_A, FIELD_B, FIELD_C)
                              (SELECT <person>, 
                                      FIELDS...
                                 FROM "LOTS OF TABLES" 
                                 JOIN "COMPLEX SQL"...
                       WHERE SOME_FIELD = <person>
                              );
    - END LOOP "PERSON"
  - END LOOP "QUERIES"
- COMMIT;

Considerations:

Some of those tables has thousands of records, and often only just a few records need to be update / delete / insert if comparing with the record set already "existing" in the table.

As deleting and "re-inserting" is causing too much "disk i/o" (so obvious), and I need to "update" just a few records, I'm trying to get an efficient way of doing it.

I tried to delete / update / insert in separeted steps, doing it directly from "complex query", but it takes too much time, because query was executed 3 times (once for delete, other for update e another for insert).

Any suggestions?

+1  A: 

Before you do this, have you run an explain plan on your complicated query, and added indices to improve it?

If you must do this, forget all the looping crap; nothing you do is going to be more optimized that the database's internal C and assembly code. Just write a view and materializing it if you must, by selecting * from it into a table. In many many cases, that'll be faster than looping, deleting and inserting.

tpdi
Agreed, make the query as fast as possible before you go down this path. Materialized view can get complicated fast if you have to keep them up to date with the current data. See http://www.pgcon.org/2008/schedule/events/69.en.html for some good info.
AngerClown
+2  A: 

The two standard references for building your own materialized views using PostgreSQL are PostgreSQL/Materialized Views and Materialized Views That Really Work

Greg Smith