views:

117

answers:

2

I am trying to build a new table such that the values in the existing table are NOT contained (but obviously the following checks for contained) in another table. Following is my table structure:

mysql> explain t1;
+-----------+---------------------+------+-----+---------+-------+
| Field     | Type                | Null | Key | Default | Extra |
+-----------+---------------------+------+-----+---------+-------+
| id        | int(11)             | YES  |     | NULL    |       | 
| point     | bigint(20) unsigned | NO   | MUL | 0       |       | 
+-----------+---------------------+------+-----+---------+-------+

mysql> explain whitelist;
+-------------+---------------------+------+-----+---------+----------------+
| Field       | Type                | Null | Key | Default | Extra          |
+-------------+---------------------+------+-----+---------+----------------+
| id          | bigint(20) unsigned | NO   | PRI | NULL    | auto_increment | 
| x           | bigint(20) unsigned | YES  |     | NULL    |                | 
| y           | bigint(20) unsigned | YES  |     | NULL    |                | 
| geonetwork  | linestring          | NO   | MUL | NULL    |                | 
+-------------+---------------------+------+-----+---------+----------------+

My query looks like this:

SELECT point 
  FROM t1 
 WHERE EXISTS(SELECT source 
                FROM whitelist 
               WHERE MBRContains(geonetwork, GeomFromText(CONCAT('POINT(', t1.point, ' 0)'))));

Explain:

    +----+--------------------+--------------------+-------+-------------------+-----------+---------+------+------+--------------------------+
| id | select_type        | table              | type  | possible_keys     | key       | key_len | ref  | rows | Extra                    |
+----+--------------------+--------------------+-------+-------------------+-----------+---------+------+------+--------------------------+
|  1 | PRIMARY            | t1                 | index | NULL              | point     | 8       | NULL | 1001 | Using where; Using index | 
|  2 | DEPENDENT SUBQUERY | whitelist          | ALL   | _geonetwork       | NULL      | NULL    | NULL | 3257 | Using where              | 
+----+--------------------+--------------------+-------+-------------------+-----------+---------+------+------+--------------------------+

The query is taking 6 seconds to execute for 1000 records in t1 which is unacceptable for me. How can I rewrite this query using Joins (or perhaps a faster way if that exists) if I don't have a column to join on? Even a stored procedure is acceptable I guess in the worst case. My goal is to finally create a new table containing entries from t1. Any suggestions?

A: 

This seems like a case where de-nomalizing t1 might be beneficial. Adding a GeomFrmTxt column with a value of GeomFromText(CONCAT('POINT(', t1.point, ' 0)')) could speed up the query you already have.

Snekse
@Snekse: My problem is that this approach needs two passes: One to add the column to t1 and the other to filter it and its extra data as well. My actual table contains a billion records so adding more data to it will sow things down terribly. I want to fully utilize the indexes and do this in one pass.
Legend
I guess what I meant by my comment was to add that row at creation time, not at query time. Overall you'll save processing time on your servers because you only have to calculate GeomFromText once, not every time you run this query. Maybe you store the column I suggested and get rid of the point column :-) Probably not an option I'm sure.
Snekse
A: 

Unless the query optimizer is failing, a WHERE EXISTS construct should result in the same plan as a join with a GROUP clause. Look at optimizing MBRContains(geonetwork, GeomFromText(CONCAT('POINT(', t1.point, ' 0)')))), that's probably where your query is spending all its time. I don't have a suggestion for that, but here's your query written with a JOIN:

Select t1.point
from t1
join whitelist on MBRContains(whitelist.geonetwork, GeomFromText(CONCAT('POINT(', t1.point, ' 0)'))))
group by t1.point
;

or to get the points in t1 not in whitelist:

Select t1.point
from t1
left join whitelist on MBRContains(whitelist.geonetwork, GeomFromText(CONCAT('POINT(', t1.point, ' 0)'))))
where whitelist.id is null
;

Simon
@Simon: Its giving me about the same execution time except that in the Extra column in the explain statement, it is mentioning: `Range checked for each record (index mp: 0x2); Not exists`
Legend