ansaurus

Question

Speeding up inner-joins and subqueries while restricting row size and table membership

Answer 1

+1 A:

The query (takes about 9 seconds on 3600 items with no indexes):

Then I would try to start off with some indexes...

OR LEFT JOIN NULL (And Indexes)

SELECT *
FROM TABLEA A LEFT JOIN
    TABLEB B ON A.ID = B. ID
WHERE B.ID IS NULL

astander 2010-03-13 23:36:54

Oh, of course, but I still want to restrict the table membership to a hundred per user (assuming that's a good idea at all), and while my SQL is rusty it *feels* like the query is very inefficient to begin with.

hiffy 2010-03-13 23:40:19

Yes. Sub selects *NOT IN* can be replaced with **left joins is null**. Also MySql LIMIT and MSSQL TOP restricts returned number of rows.

astander 2010-03-13 23:42:02

OMG Ponies 2010-03-13 23:44:40

Not in is slower, but not exists can be as fast if used with an outside ref.

astander 2010-03-13 23:51:47

Answer 2

A:

One way to optimize the select is to replace the subqueries with joins.

Something like:

select entries.id, subscriptions_users.user_id
from entries 
inner join subscriptions_users on subscriptions_users.subscription_id = entries.subscription_id 
left join metadata  md on (user_id,entry_id)
left join stream  str on (user_id, entry_id) 
where subscriptions_users.user_id = 1 and where md.user_id is null and str.user_id is null;

You would have to make sure that the join conditions for the left join are correct. I am not sure what your exact schema is, so I can't.

Also, adding indexes would also help.

Anatoly Fayngelerin 2010-03-13 23:45:11

Answer 3

+1 A:

Use:

INSERT INTO STREAM 
  (entry_id, user_id) 
   SELECT e.id, 
          su.user_id 
     FROM ENTRIES e
     JOIN SUBSCRIPTIONS_USERS su ON su.subscription_id = e.subscription_id 
                                AND su.user_id = 1 
LEFT JOIN METADATA md ON md.entry_id = e.id
                     AND md.user_id = 1
LEFT JOIN STREAM s ON s.entry_id = e.id
                  AND s.user_id = 1
    WHERE md.entry_id IS NULL
      AND s.entry_id IS NULL

In MySQL, the LEFT JOIN/IS NULL is the most efficient means of getting data that exists in one table, but not another. Reference link

Check the query performance before looking at indexes.

In Postgres:

NOT IN
NOT EXISTS
LEFT JOIN / IS NULL

...are equivalent.

OMG Ponies 2010-03-13 23:47:34

Assuming the values Sequel Pro returns are accurate, on an empty stream table that insert takes less time (a total of 7seconds vs 12seconds) but on a filled table it takes longer (12seconds vs 8 seconds). I haven't had time to experiment with adding indexes - I would imagine it might improve; I'll report back later.Also: any ideas on how to limit it to filling it with only 1 hundred items :D?

hiffy 2010-03-13 23:57:21

@hiffy: Try adding `LIMIT 100` to the end of the query. Could the `METADATA` and `STREAM` entry_id columns allow null values?

OMG Ponies 2010-03-14 00:13:10

@OMG, adding LIMIT 100 will add a different hundred entries everytime it's executed until the unread entries are exhausted; I wanted it to insert one hundred at most everytime it's executed — a kind of 'replenishing' of the stream table. 30 entries have been read? This time around insert the top 30 that match. As to the schema, entry_ids can never be null, if that is what you asking.

hiffy 2010-03-14 00:23:55

@hiffy: Use and `ORDER BY` clause to ensure consistency.

OMG Ponies 2010-03-14 00:34:30

@OMG No dice. I added an `ORDER BY e.published DESC LIMIT 100` to the very end, and it keeps inserting a fresh one hundred entries (instead of the desired behaviour of 0 rows being affected, since no new rows were read). Am I misunderstanding something? It seems to be the only place I can syntactically place that restriction in the query. (Thanks so much for your help so far! I'd upvote you if I had the reputation).

hiffy 2010-03-14 00:45:04

@hiffy: To stop duplicates from being inserted, you need to specify a column as a primary key.

OMG Ponies 2010-03-14 01:06:50

ansaurus

tags:

views:

answers:

Speeding up inner-joins and subqueries while restricting row size and table membership

related questions