views:

30

answers:

2

i have a contact_relationship table that stores the reported strength of a relationship between one contact and another at a given point in time.

mysql> desc contact_relationship;
+------------------+-----------+------+-----+-------------------+-----------------------------+
| Field            | Type      | Null | Key | Default           | Extra                       |
+------------------+-----------+------+-----+-------------------+-----------------------------+
| relationship_id  | int(11)   | YES  |     | NULL              |                             |
| contact_id       | int(11)   | YES  | MUL | NULL              |                             |
| other_contact_id | int(11)   | YES  |     | NULL              |                             |
| strength         | int(11)   | YES  |     | NULL              |                             |
| recorded         | timestamp | NO   |     | CURRENT_TIMESTAMP | on update CURRENT_TIMESTAMP |
+------------------+-----------+------+-----+-------------------+-----------------------------+

now i want to get a list of two-way relationships between contacts (meaning there are two rows, one with contact a specifying a relationship strength with contact b and another with contact b specifying a strength for contact a -- the strength of the two-way relationship is the smaller of those two strength values).

this is the query i've come up with but it is pretty slow:

select 
    mrcr1.contact_id, 
    mrcr1.other_contact_id, 
    case when (mrcr1.strength < mrcr2.strength) then 
        mrcr1.strength 
    else 
        mrcr2.strength 
    end strength 
from ( 
    select 
        cr1.* 
    from ( 
        select 
            contact_id,
            other_contact_id,
            max(recorded) as max_recorded 
        from 
            contact_relationship 
        group by 
            contact_id,
            other_contact_id 
    ) as cr2 
    inner join contact_relationship cr1 on 
        cr1.contact_id = cr2.contact_id 
        and cr1.other_contact_id = cr2.other_contact_id 
        and cr1.recorded = cr2.max_recorded 
) as mrcr1, 
( 
    select 
        cr3.* 
    from ( 
        select 
            contact_id,
            other_contact_id,
            max(recorded) as max_recorded 
        from 
            contact_relationship 
        group by 
            contact_id,
            other_contact_id 
    ) as cr4 
    inner join contact_relationship cr3 on 
        cr3.contact_id = cr4.contact_id 
        and cr3.other_contact_id = cr4.other_contact_id 
        and cr3.recorded = cr4.max_recorded 
) as mrcr2 
where 
    mrcr1.contact_id = mrcr2.other_contact_id 
    and mrcr1.other_contact_id = mrcr2.contact_id 
    and mrcr1.contact_id != mrcr1.other_contact_id 
    and mrcr2.contact_id != mrcr2.other_contact_id 
    and mrcr1.contact_id <= mrcr1.other_contact_id; 

anyone have any recommendations of how to speed it up?

note that because a user may specify the strength of his relationship with a particular user more than once, you must only grab the most recent record for each pair of contacts.

update: here is the result of explaining the query...

+----+-------------+----------------------+-------+----------------------------------------------------------------------------------------+------------------------------+---------+-------------------------------------+-------+--------------------------------+
| id | select_type | table                | type  | possible_keys                                                                          | key                          | key_len | ref                                 | rows  | Extra                          |
+----+-------------+----------------------+-------+----------------------------------------------------------------------------------------+------------------------------+---------+-------------------------------------+-------+--------------------------------+
|  1 | PRIMARY     | <derived2>           | ALL   | NULL                                                                                   | NULL                         | NULL    | NULL                                | 36029 | Using where                    |
|  1 | PRIMARY     | <derived4>           | ALL   | NULL                                                                                   | NULL                         | NULL    | NULL                                | 36029 | Using where; Using join buffer |
|  4 | DERIVED     | <derived5>           | ALL   | NULL                                                                                   | NULL                         | NULL    | NULL                                | 36021 |                                |
|  4 | DERIVED     | cr3                  | ref   | contact_relationship_index_1,contact_relationship_index_2,contact_relationship_index_3 | contact_relationship_index_2 | 10      | cr4.contact_id,cr4.other_contact_id |     1 | Using where                    |
|  5 | DERIVED     | contact_relationship | index | NULL                                                                                   | contact_relationship_index_3 | 14      | NULL                                | 37973 | Using index                    |
|  2 | DERIVED     | <derived3>           | ALL   | NULL                                                                                   | NULL                         | NULL    | NULL                                | 36021 |                                |
|  2 | DERIVED     | cr1                  | ref   | contact_relationship_index_1,contact_relationship_index_2,contact_relationship_index_3 | contact_relationship_index_2 | 10      | cr2.contact_id,cr2.other_contact_id |     1 | Using where                    |
|  3 | DERIVED     | contact_relationship | index | NULL                                                                                   | contact_relationship_index_3 | 14      | NULL                                | 37973 | Using index                    |
+----+-------------+----------------------+-------+----------------------------------------------------------------------------------------+------------------------------+---------+-------------------------------------+-------+--------------------------------+
A: 

You are losing a lot lot lot of time selecting the most recent record. 2 options :

1- Change the way you are stocking data, and have a table with only recent record, and an other table more like historical record.

2- Use analytic request to select the most recent record, if your DBMS allows you to do this. Something like

Select first_value(strength) over(partition by contact_id, other_contact_id order by recorded desc)
from contact_relationship

Once you have the good record line, I think your query will go a lot faster.

Scorpi0
you got me thinking i could use a temp table to pull out all the newest relationship rows (see below).thanks!
emh
A: 

Scorpi0's answer got me to thinking maybe I could use a temp table...

create temporary table mrcr1 (
    contact_id int, 
    other_contact_id int, 
    strength int, 
    index mrcr1_index_1 (
        contact_id, 
        other_contact_id
    )
) replace as 
    select 
        cr1.contact_id, 
        cr1.other_contact_id, 
        cr1.strength from ( 
            select 
                contact_id, 
                other_contact_id, 
                max(recorded) as max_recorded 
            from 
                contact_relationship 
            group by 
                contact_id, other_contact_id 
        ) as cr2 
        inner join 
            contact_relationship cr1 on 
                cr1.contact_id = cr2.contact_id 
                and cr1.other_contact_id = cr2.other_contact_id 
                and cr1.recorded = cr2.max_recorded;

which i had to do twice (second time into a temp table named mrcr2) because mysql has a limitation where you can't alias the same temp table twice in one query.

with my two temp tables created my query then becomes:

select 
    mrcr1.contact_id, 
    mrcr1.other_contact_id, 
    case when (mrcr1.strength < mrcr2.strength) then 
        mrcr1.strength 
    else 
        mrcr2.strength 
    end strength 
from 
    mrcr1,
    mrcr2 
where 
    mrcr1.contact_id = mrcr2.other_contact_id 
    and mrcr1.other_contact_id = mrcr2.contact_id 
    and mrcr1.contact_id != mrcr1.other_contact_id 
    and mrcr2.contact_id != mrcr2.other_contact_id 
    and mrcr1.contact_id <= mrcr1.other_contact_id; 
emh
unfortunately, in production, i don't have access to create temp tables :(
emh
temp table are not really a good idea. Argue with your superior that you have to change your structure. One table for production and current query, and one log table which insert new rows as the production table is update.
Scorpi0
could you explain why temp tables are a bad idea?
emh
It is hard to log what happens in temp tables, and so in a process, if something goes wrong (and it will), it will be hard to know what happens wrong if table are always create and then drop. temp table can be use for temporary solution, but not as a targeted solution..
Scorpi0