ansaurus

Question

SQL Alternative to performing an INNER JOIN on a single table

Answer 1

+1 A:

I'd need a little more info to diagnose the speed issue, but to remove the dups, add this to the WHERE:

AND F.source<S.source

KM 2009-08-07 21:05:25

Ah so simple. This worked perfectly for eliminating the duplicates. Thanks

cruzja 2009-08-10 13:54:29

Answer 2

+2 A:

Try this:

SELECT token, GROUP_CONCAT(source), SUM(count)
FROM TokenFrequency
GROUP BY token;

This should run a lot faster and also eliminate the duplicates. But the sources will be returned in a comma-separated list, so you'll have to explode that in your application.

You might also try creating a compound index over the columns token, source, count (in that order) and analyze with EXPLAIN to see if MySQL is smart enough to use it as a covering index for this query.

update: I seem to have misunderstood your question. You don't want the sum of counts per token, you want the sum of counts for every pair of sources for a given token.

I believe the inner join is the best solution for this. An important guideline for SQL is that if you need to calculate an expression with respect to two different rows, then you need to do a join.

However, one optimization technique that I mentioned above is to use a covering index so that all the columns you need are included in an index data structure. The benefit is that all your lookups are O(log n), and the query doesn't need to do a second I/O to read the physical row to get other columns.

In this case, you should create the covering index over columns token, source, count as I mentioned above. Also try to allocate enough cache space so that the index can be cached in memory.

Bill Karwin 2009-08-07 21:06:33

+1 for the right approach; but such an index would be almost as big as the whole record, do you think it would be faster than just indexing on token?

Javier 2009-08-07 21:36:21

Depends on the number of rows and other system-specific factors. The only way to be sure is to try it with *your* database and measure the performance.

Bill Karwin 2009-08-07 21:38:11

cruzja 2009-08-10 13:46:33

Apologies for misunderstanding your question. See update above.

Bill Karwin 2009-08-10 17:37:31

Thanks for the update and the tips. I will work on using a convering index and update my results.

cruzja 2009-08-10 17:58:11

Answer 3

+1 A:

If token isn't indexed, it certainly should be.

Carl Manaster 2009-08-07 21:17:42

ansaurus

tags:

views:

answers:

SQL Alternative to performing an INNER JOIN on a single table

related questions