views:

87

answers:

1

How might I optimize this query? The schema:

mysql> show columns from transactionlog;
+---------------+-------------------------------------------+------+-----+---------+----------------+
| Field         | Type                                      | Null | Key | Default | Extra          |
+---------------+-------------------------------------------+------+-----+---------+----------------+
| id            | int(11)                                   | NO   | PRI | NULL    | auto_increment |
| transactionid | varchar(10)                               | NO   | MUL | NULL    |                |
| queryid       | tinyint(4)                                | NO   |     | NULL    |                |
| tableid       | varchar(30)                               | NO   | MUL | NULL    |                |
| tupleid       | int(11)                                   | NO   |     | NULL    |                |
| querytype     | enum('select','insert','delete','update') | NO   |     | NULL    |                |
| schemaname    | varchar(20)                               | YES  |     | NULL    |                |
| partition     | tinyint(3) unsigned                       | YES  |     | NULL    |                |
+---------------+-------------------------------------------+------+-----+---------+----------------+
8 rows in set (0.04 sec)

The query:

select concat(weight, ' ', ids, '\n')
from (
  select
    tableid,
    tupleid,
    group_concat(id separator ' ') as ids,
    (
      select count(distinct transactionid)
      from transactionlog
      where transactionid in (
        select transactionid
        from transactionlog
        where (tableid, tupleid, querytype) =
              (t.tableid, t.tupleid, 'update')
        group by transactionid
        having count(*) > 0
      )
    ) weight
  from transactionlog t
  group by tableid, tupleid
  having weight > 0 and count(*) > 1
) u;

This is the output of EXPLAIN and mk-visual-explain:

+----+--------------------+----------------+-------+---------------+---------------+---------+-----------+------+------------------------------
----------------+
| id | select_type        | table          | type  | possible_keys | key           | key_len | ref       | rows | Extra                                        |
+----+--------------------+----------------+-------+---------------+---------------+---------+-----------+------+----------------------------------------------+
|  1 | PRIMARY            | <derived2>     | ALL   | NULL          | NULL          | NULL    | NULL      |   13 |                                              |
|  2 | DERIVED            | t              | ALL   | NULL          | NULL          | NULL    | NULL      |   68 | Using filesort                               |
|  3 | DEPENDENT SUBQUERY | transactionlog | index | NULL          | transactionid | 12      | NULL      |   68 | Using where; Using index                     |
|  4 | DEPENDENT SUBQUERY | transactionlog | ref   | tableid       | tableid       | 36      | func,func |    2 | Using where; Using temporary; Using filesort |
+----+--------------------+----------------+-------+---------------+---------------+---------+-----------+------+----------------------------------------------+
Table scan
rows           13
+- DERIVED
   table          derived(t,transactionlog,temporary(transactionlog))
   +- DEPENDENT SUBQUERY
      +- DEPENDENT SUBQUERY
      |  +- Filesort
      |  |  +- TEMPORARY
      |  |     table          temporary(transactionlog)
      |  |     +- Filter with WHERE
      |  |        +- Bookmark lookup
      |  |           +- Table
      |  |           |  table          transactionlog
      |  |           |  possible_keys  tableid
      |  |           +- Index lookup
      |  |              key            transactionlog->tableid
      |  |              possible_keys  tableid
      |  |              key_len        36
      |  |              ref            func,func
      |  |              rows           2
      |  +- Filter with WHERE
      |     +- Index scan
      |        key            transactionlog->transactionid
      |        key_len        12
      |        rows           68
      +- Filesort
         +- Table scan
            rows           68
            +- Table
               table          t

That is a lot of work. I can write the equivalent logic in Python while making a single pass:

results = query("""
  select tableid, tupleid, transactionid, id, querytype
  from transactionlog_2warehouse
""")
_tab, _tup = None
ids = []
weight = 0
saw_upd = False
for tab, tup, txn, id, qt in results:
  if (_tab, _tup) != (tab, tup):
    if len(ids) > 1 and weight > 0:
      print weight, ids
    weight = 0
    ids = []
    _txn = None
  if _txn != txn:
    saw_upd = False
  if qt == 'update' and not saw_upd:
    weight += 1
    saw_upd = True
  ids += [id]

Is it possible to achieve the Python single-pass performance using pure SQL? Thanks in advance!

A: 

Use:

  SELECT CONCAT(x.weight, ' ', GROUP_CONCAT(t.id SEPARATOR ' '), '\n')
    FROM TRANSACTIONLOG t
    JOIN (SELECT tl.tableid,
                 tl.tupleid,
                 COUNT(DISTINCT tl.transactionid) AS weight
            FROM TRANSACTIONLOG tl
           WHERE tl.querytype = 'update'
        GROUP BY tl.tableid, tl.tupleid) x ON x.tableid = t.tableid
                                          AND x.tupleid = t.tupleid
                                          AND x.weight > 0
GROUP BY t.tableid, t.tupleid, x.weight
  HAVING COUNT(*) > 1
OMG Ponies