views:

69

answers:

3

I need help in optimizing the following query. It is taking a long time to finish. It takes almost 213 seconds . because of some constraints, I can not add an index and have to live with existing ones.

INSERT INTO temp_table_1
( USER_ID, role_id, participant_code, status_id )
WITH A AS
 (SELECT USER_ID user_id,ROLE_ID, STATUS_ID,participant_code
  FROM    USER_ROLE WHERE   participant_code IS NOT NULL), --1
B AS
 (SELECT ROLE_ID
  FROM    CMP_ROLE
  WHERE   GROUP_ID = 3),
C AS (SELECT USER_ID FROM USER) --2

SELECT USER_ID,ROLE_ID,PARTICIPANT_CODE,MAX(STATUS_ID)
FROM A INNER JOIN B USING (ROLE_ID)
       INNER JOIN C USING (USER_ID)
GROUP BY USER_ID,role_id,participant_code ;

--1 = query when ran alone takes 100+ seconds

--2 = query when ran alone takes 19 seconds

DELETE temp_table_1
WHERE ROWID NOT IN
( SELECT a.ROWID
  FROM temp_table_1 a,
  USER_ROLE b
  WHERE a.status_id = b.status_id
  AND   ( b.ACTIVE IN ( 1 ) OR ( b.ACTIVE IN ( 0,3 ) 
  AND SYSDATE BETWEEN b.effective_from_date AND b.effective_to_date ))
);

It seems like the person who wrote the query is trying to get everything into a temp table first and then deleting records from the temp table. whatever is left is the actual results.

Can't it be done such a way that there is no need for the delete? We just get the results needed since that will save time?

+2  A: 

Here is a query which naively combines the two queries above, so make sure that you check and compare the outputs from the two methods.

select 
  r.user_id, r.role_id, r.participant_code, max(status_id)
from 
  user_role r, 
  cmp_role c
where 
      r.role_id = c.role_id
  and r.active in (0,1,3)
  and r.participant_code is not null
  and sysdate between r.effective_from_date and r.effective_to_date
  and c.group_id = 3
group by 
  r.user_id, r.role_id, r.participant_code;

It is not necessary to use a temporary table and then delete records afterwards to get the required results. Although, there may have been a reason for its use, maybe performance?

Also, it looks like the query and join to the USER table is unnecessary as the USER_ID is available from USER_ROLES. I have omitted it from the query above. Hopefully that gives you a good start to improving it.

ar
unfortunatley, `USER` table has more `USER_ID`s than `USER_ROLES` table. I will get the explain plan shortly
deming
@deming - the join is an *inner join* so it will return only the rows which are in *both tables*. The join on USER is only required if your system doesn't enforce the foreign key.
APC
+1 This is definitely how I would approach the same problem.
APC
that works perfectly. though, I'd appreciate if you could break down how you went about solving this. apart from USER_ID being in both tables
deming
A: 

This should be semantically equivalent to the set left in the temp table after the delete in the existing code. Though I would agree with AR, that the User table is not needed unless it contains less user_id's than user_role. Otherwise it does not limit the set in any way. If User contains more user_id's than user_role, that will not change the resulting set. User_role is the main driver in this query with small limiting coming from the cmp_role table.

select a.user_id,
       a.role_id,
       a.participant_code,
       a.status_id
  from (select a.user_id,
               a.role_id,
               a.participant_code,
               max(status_id) status_id
          from user_role a,
               (select role_id
                  from cmp_role
                 where group_id = 3
               ) b
         where a.participant_code is not null
           and a.active in (0, 1, 3)
           and sysdate between a.effective_from_date and a.effective_to_date
           and a.role_id = b.role_id
         group by a.user_id,
                  a.role_id,
                  a.participant_code
       ) a
       user c
 where a.user_id = c.user_id;

If performance is still poor, then one would probably look at indexes on some of the fields that are used for limiting the data (user_role.role_id, user_role.participant_code, user_role.active, user_role.effective_from_date, user_role.effective_to_date).

Of course an explain plan or trace will be needed to get the full story on what Oracle is doing when it executes this query based on your data and structure.

Dougman
A: 

Let's point out some obvious things.

--1 = query when ran alone takes 100+ seconds

--2 = query when ran alone takes 19 seconds

A full table scan on a USER table should not take 19 seconds. A full table scan on a USER_ROLE table should not take over 100 seconds, even if it has tens of millions of rows. Of course, if you really do have twenty million users then those timings are slightly less unreasonable but still not acceptable.

You need to understand why it takes your system so long to do simple look-ups. An EXPLAIN PLAN will hope us understand the joins but that won't solve your core problem: why does it take so long to retrieve the USER_ROLE data? Is it a complicated view? Does it have hundreds of millions of queries? Do you have an index on PARTICIPANT_CODE which is not helpful to this query?

What about other queries using these tables? Are they problematic as well? If so, you need to investigate more. Either the system is taking too long to do something or it is waiting too on some resource. What you need to do is run a 10046 trace against this query and establish where the time is going. This trace will report on the wait events for your session. That will give you some decent information on which to proceed. It is much better than guessing.

Oracle has exposed the Wait Interface since 9i. Roger Schrag wrote a pretty decent introduction. Read it now. (If you're on 10g or higher you should also read his follow-up article).

APC