views:

42

answers:

1

I've come across a query that is taking "too long". The query has 50+ left joins between 10 or so tables. To give a brief overview of the database model, the tables joined are tables that store data for a particular data type (ex: date_fields, integer_fields, text_fields, etc.) and each has a column for the value, a "datafield" id, and a ticket id. The query is built programmatically based on an association table between a "ticket" and its "data fields".

The join statements look something like the following:

...FROM tickets t
LEFT JOIN ticket_text_fields t001 ON(t.id=t001.ticket_id AND t001.textfield_id=7)
...
LEFT JOIN ticket_date_fields t056 ON(t.id=t056.ticket_id AND t056.datafield_id=434)

When using explain on the query shows the following:

1   SIMPLE   t       ref   idx_dataset_id                   idx_dataset_id  5   const   2871   Using where; Using temporary; Using filesort
1   SIMPLE   t001   ref   idx_ticket_id,idx_datafield_id   idx_ticket_id   5   t.id   5   
... 
1   SIMPLE   t056   ref   idx_ticket_id,idx_datafield_id   idx_ticket_id   5   t.id   8

What direction can I take to tune this query? All the indexes seem to be in place. Perhaps the t table (tickets) row number (2871) should be reduced. How many left joins is too much? Should the datafield tables be joined only once and then queried each for the data that is required?

+5  A: 

You're using a variation of the terrible antipattern called Entity-Attribute-Value. You're storing attributes on separate rows, so if you want to reconstruct something that looks like a conventional row of data, you need to make one join per attribute.

It's not surprising this creates a query with 50 joins. This is far too many for most databases to run efficiently (you haven't identified which database you're using). Eventually you'll want a few more attributes and you might exceed some architectural limit of the database on the number of joins it can do.

The solution is: don't reconstruct the row in SQL.

Instead, query the attributes as multiple rows, instead of trying to combine them onto a single row.

SELECT ... FROM tickets t
INNER JOIN ticket_text_fields f ON t.id=f.ticket_id
WHERE f.textfield_id IN (7, 8, 9, ...)
UNION ALL
SELECT ... FROM tickets t
INNER JOIN ticket_date_fields d ON t.id=d.ticket_id
WHERE d.datafield_id IN (434, 435, 436, ...)

Then you have to write a function in your application to loop over the resulting rowset, and collect the attributes one by one into an object in application space, so then you can use it as if it's a single entity.

Bill Karwin
Great response Bill! I couldn't think of a good solution without changing the schema so I kept reloading this question to see what others would say. I like your solution.
Plaudit Design - Web Design
Agreed. Thanks for the info and solution!
Dan G