tags:

views:

76

answers:

2

I am going through some pl/sql code with no comments. Trying to make sense of it and optimizing it. Here is the sample:

INSERT INTO gtt1 --75711 rows
(USER_ID, role_id, participant_code, status_id )
SELECT
 r.user_id, r.role_id, r.participant_code, MAX(status_id)
FROM
  user_role r,
  cmp_role c
WHERE
  r.role_id = c.role_id
  AND r.participant_code IS NOT NULL
  AND c.group_id = 3
  GROUP BY
  r.user_id, r.role_id, r.participant_code;

Then

DELETE gtt1
WHERE ROWID IN (SELECT ROWID FROM gtt1
                MINUS
                SELECT a.ROWID FROM gtt1 a, UIV_CMP_USER_ROLE b
                WHERE a.status_id = b.status_id
                AND (b.ACTIVE = 1 OR ( b.ACTIVE IN ( 0,3 ) 
                      AND SYSDATE BETWEEN b.effective_from_date AND b.effective_to_date
                     )
                )
                );

finally (this takes longest)

OPEN cv_1 FOR

SELECT c.role_id,
       c.subgroup,
       c.subgroup_description,
       COUNT(a.USER_ID) user_count
FROM   
    (SELECT b.user_id, b.role_id FROM gtt1 b, pt_user e
    --pt_user table has 73000 rows
        WHERE  e.user_id = RTRIM(b.user_id)
       ) a
RIGHT OUTER JOIN CMP_ROLE c ON a.role_id = c.role_id
WHERE c.group_id = v_group_id
GROUP BY c.role_id,c.subgroup,c.subgroup_description
ORDER BY c.subgroup;

Is there a way I can avoid the deletion from gtt1 and initially just get rows we want?

Running explain plan I notice some full table scans on this query:

SELECT 
   r.user_id, r.role_id, r.participant_code, MAX(status_id) 
  FROM 
    user_role r, 
    cmp_role c 
  WHERE 
    r.role_id = c.role_id 
    AND r.participant_code IS NOT NULL 
    AND c.group_id = 3 
    GROUP BY 
    r.user_id, r.role_id, r.participant_code 
    HAVING MAX(status_id) IN (SELECT b.status_id FROM UIV_CMP_USER_ROLE b 
                              WHERE (b.ACTIVE = 1 OR ( b.ACTIVE IN ( 0,3 )  
                                     AND SYSDATE BETWEEN b.effective_from_date AND b.effective_to_date 
                                    )) 
                             ) 

user_role = 803507 rows

cmp_role = 27 rows

user_role has 5 indexes:

idx 1 = role_id

idx 2 = last_updt_user_id

idx 3 = actv_id, participant_code, effective_from_Date, effective_to_date

idx 4 = user_id, role_id, effective_from_Date, effective_to_date

idx 5 = participant_code, user_id, roke_id, actv_cd

+1  A: 

Do you want to optimize the query itself or just the time it takes to give the results? Second would need some data about the numbers of rows and so on...

poeschlorn
just the time it takes for the results. I will provide more details on # of rows for each table used
Mehur
A: 

The INSERT and DELETE appear to me to be equivalent to this:

INSERT INTO gtt1 --75711 rows
(USER_ID, role_id, participant_code, status_id )
SELECT
 r.user_id, r.role_id, r.participant_code, MAX(status_id)
FROM
  user_role r,
  cmp_role c
WHERE
  r.role_id = c.role_id
  AND r.participant_code IS NOT NULL
  AND c.group_id = 3
  GROUP BY
  r.user_id, r.role_id, r.participant_code
  HAVING MAX(status_id) IN (SELECT b.status_id FROM UIV_CMP_USER_ROLE b
                            WHERE (b.ACTIVE = 1 OR ( b.ACTIVE IN ( 0,3 ) 
                                   AND SYSDATE BETWEEN b.effective_from_date
                                               AND b.effective_to_date
                                  ))
                           );

Whether that is more efficient I have no idea - I'd need to know a lot more about the tables, indexes and data.

You could then go further and turn the GTT into a subquery, for example:

WITH gtt1 AS 
  (SELECT
   r.user_id, r.role_id, r.participant_code, MAX(status_id)
  FROM
    user_role r,
    cmp_role c
  WHERE
    r.role_id = c.role_id
    AND r.participant_code IS NOT NULL
    AND c.group_id = 3
    GROUP BY
    r.user_id, r.role_id, r.participant_code
    HAVING MAX(status_id) IN (SELECT b.status_id FROM UIV_CMP_USER_ROLE b
                              WHERE (b.ACTIVE = 1 OR ( b.ACTIVE IN ( 0,3 ) 
                                     AND SYSDATE BETWEEN b.effective_from_date AND b.effective_to_date
                                    ))
                             )
  )
SELECT c.role_id,
       c.subgroup,
       c.subgroup_description,
       COUNT(a.USER_ID) user_count
FROM   
    (SELECT b.user_id, b.role_id FROM gtt1 b, pt_user e
    --pt_user table has 73000 rows
        WHERE  e.user_id = RTRIM(b.user_id)
       ) a
RIGHT OUTER JOIN CMP_ROLE c ON a.role_id = c.role_id
WHERE c.group_id = v_group_id
GROUP BY c.role_id,c.subgroup,c.subgroup_description
ORDER BY c.subgroup;

Again, I have no idea whether this is more or less efficient than the current code.

Tony Andrews
Tony, the combined `WITH AS` query works perfect but I still need to shave off some time. I'm providing the rows in all the tables. could you please see if it can be optimized somewhere?
Mehur
I'm not surprised that the query involves some full scans: cmp_role only has 27 rows, so a full scan makes sense, and a full scan on user_role MAY also make sense if many of its rows will match the criteria that participant_code is not null and group_id = 3 (how many user_role rows do satisfy these criteria BTW?)
Tony Andrews
39392 rows are returned back when running that query and it takes 19 seconds.
Mehur
How many rows are returned if you take off the HAVING clause?
Tony Andrews
75711 in 32 seconds.
Mehur
OK, well that's a fair number and may be why Oracle thinks not worth using any index. I see you have started another question about tuning the query, so probably not worth pursuing it here any further.
Tony Andrews
@tony, I have not found solution to optimizing this query. Can you suggest some ways of combining this in one join. I can compare the results to see if they match original query
Mehur
Not really, without understanding the data a whole lot more. I do find the whole MAX(status_id) part puzzling - what is significant about the maximum status_id value?
Tony Andrews