views:

88

answers:

3

I have a table like so:

call_activity (
    call_id TEXT, 
    activity_type TEXT,
    activity_time TIMESTAMP,
    PRIMARY KEY(call_id, activity_type, activity_time)
)

activity_type may be one of about 9 different strings:

 'started'
 'completed' (about 5 variations on this)
 'other' (these are the states that I want to display)

A call has a series of events, starting with 'started' and culminating in a completed event (one of 5 or so possible ones). I need a view of this with two columns: The first column must be the activity time of a call's 'started' event, the second column must be the most recent event for that call. This view must only have calls for which there is no completed event.

I have a set of nested joins on it, but they're slow as hell. I need a reasonably optimal view of this. Can anyone help me?

A: 

To do this the db has to at least find all the started calls and find if any completed activities exist for it. Assuming that not-completed is a small set, fetching the most recent activity can be done as a subquery. Here's a query that does this:

SELECT c_started.call_id, c_started.activity_id AS started_time,
        (SELECT MAX(c_recent.activity_time)
            FROM call_activity AS c_recent
            WHERE c_recent.call_id = c_started.call_id) AS recent_activity
    FROM call_activity AS c_started
        LEFT JOIN call_activity AS c_completed
            ON c_started.call_id = c_completed.call_id
                AND c_completed.activity_type IN ('completed 1' 'completed 2', ...)
    WHERE c_started.activity_type = 'started'
        AND c_completed.call_id IS NULL;

If you can add indexes, the first choice would be a partial index on call_id where activity_type is in the completed events (the same check as in the join condition). Another would be an index on activity_type, possibly partial with only 'started' events to speed up the initial scan. Finally a call_id, activity_time index would speed up the subquery if you have lots of events per call. You can also get that if you reorder activity_type and activity_time in the primary key.

To make this fast, I would create an active_calls table with only a call_id column and add an insert trigger on call_activity to insert into active_calls if 'started' is inserted and delete if 'completed' is inserted.

Ants Aasma
A: 

Change datatypes for your IDs, and try something like this (add 'started' to the list of others filter if you would like to include 'started' in the latest non-completed activity as well):

SELECT      ca_s.activity_time AS timestamp_started,
            ca_o.activity_time AS timestamp_other
FROM        call_activity ca_s
LEFT  JOIN  call_activity ca_o
        ON  ca_s.call_id = ca_o.call_id
        AND ca_o.activity_type IN ('other-1', 'other2-2', ...)
LEFT  JOIN  call_activity ca_c
        ON  ca_s.call_id = ca_c.call_id
        AND ca_s.activity_type IN ('completed-1', 'completed-2', ...)
WHERE       ca_s.activity_type = 'started'
        AND ca_c.call_id IS NULL --// no complete events
van
plus, index on call_id would help in any case
van
A: 

A solution without joins, using the CASE statement and grouping

    select call_id , 
       min(case when activity_type = 'started' then activity_time
                else null 
           end) as timestamp_started,
      max(activity_time) as timestamp_other
from call_activity
group by call_id
having 
       sum(case when activity_type = 'completed-1' then 1
                when activity_type = 'completed-2' then 1
                else 0
           end) = 0
Ferran