views:

491

answers:

6

Given the following table:

Table events
id
start_time
end_time

Is there a way to quickly search for a constant?

E.g.

SELECT *
FROM events
WHERE start_time<='2009-02-18 16:27:12' 
AND     end_time>='2009-02-18 16:27:12'

I am using MySQL. Having an index on either field still has to check a range. Moreover an index on both fields will not make a difference (only the first will be used).

I can add fields / indexes to the table (so adding an indexed constructed field containing the info of both fields would be acceptable).

P.S. The need for this came from this question: http://stackoverflow.com/questions/557425/optimize-sql-that-uses-between-clause

+1  A: 

There is no efficient way to do exactly this query in MySQL.

If your ranges do not overlap, though, you can just use start_time <= const along with ORDER BY start_time DESC LIMIT 1 and further checking for end_time >= const.

You'll need to do it in a function, as MySQL for some reason doesn't use INDEX RANGE SCAN for ORDER BY in a subquery if the range condition is taken from a superquery.

CREATE UNIQUE INDEX ux_b_start ON b (start_date);

CREATE FUNCTION `fn_get_last_b`(event_date TIMESTAMP) RETURNS int(11)
BEGIN
  DECLARE id INT;
  SELECT b.id
  INTO id
  FROM b
  FORCE INDEX (ux_b_start)
  WHERE b.start_time <= event_date
  ORDER BY
    b.start_time DESC
  LIMIT 1;
  RETURN id;
END;

SELECT COUNT(*) FROM a;

1000


SELECT COUNT(*) FROM b;

200000

SELECT *
FROM (
  SELECT fn_get_last_b(a.event_time) AS bid,
         a.*
  FROM a
) ao, b FORCE INDEX (PRIMARY)
WHERE b.id = ao.bid
  AND b.end_time >= ao.event_time

1000 rows fetched in 0,0143s (0,1279s)
Quassnoi
Your "start_time <= const along with ORDER BY start_time DESC LIMIT 1" is a very good idea. It gives good performance as the start_date key seems to be used quite efficiently. The rest of your answer should be posted in the other question I posted!
daremon
It is posted there too :)
Quassnoi
A: 

There's not much you can do within the one table. If optimizing these queries 1) is necessary 2) must be done at the SQL level, then you'll need to make a derived table:

Table event_times
id
event_id
mark_time

and add a record to it for every unit of time spanned by each event. Then you just

SELECT *
FROM events
LEFT JOIN event_times ON event_id = events.id
WHERE mark_time = '2009-02-18 16:27:12'

You can make this table a good bit less ridiculous by how you define 'unit of time', i.e. if you limit the resolution of mark_time to minutes or hours rather than seconds.

chaos
A: 

I don't have much experience with MySQL, but on MS SQL Server adding an index on both columns allowed for an index seek and return times on a 1M row table went from 1-2 seconds to millisecond response times.

It seems that you're seeing different results. I wonder if a constraint is making the difference. I have a check constraint to enforce that start_time < end_time.

Tom H.
MS SQL uses an "index combine" in this case. It selects both ranges using two indexes and finds an intersection using a hash join. If you put a constant that has both lots of start_times and lots of end_times satisfying the appropriate condition, it will be the most inefficient case.
Quassnoi
Or if you create a multicolumn index, it will use and index scan to check for both conditions. In this case, the bigger is the constant, the slower will be the query.
Quassnoi
P. S. That wasn't me who downvoted you :)
Quassnoi
If it has start_time > @time then it shouldn't need a full index scan, so the time of the constant (I assume that's what you mean by "bigger") shouldn't matter. As for whoever down-voted a reason might be nice seeing as how nothing I stated was incorrect, it was tested, and I explained it was for MS
Tom H.
Well i sure would like to see people's silly reasons for downvoting my answer too. Especially considering that it too is practically flawless. But clearly this place is full of people who shoot before they think.
Zuu
If there is an index on (start_time, end_time), then the query will do a RANGE SCAN on this index from @time to MIN(start_time), skipping the end_time's that don't fit. If @time is deep in past, the RANGE SCAN will scan only few records, if @time = NOW(), the RANGE SCAN will scan the whole million.
Quassnoi
A: 

You've basically got a query with 2 distinctly separate range conditions. You are using >=, to MySQL this is always a range scan. There is documentation here to optimize range scans.

The bottom line is that MySQL performs an additional check to filter out rows that satisfy the range condition, and then satisfies the rest of the WHERE clause, which in your case is another range condition.

mluebke
+4  A: 

There is one caveat to my solution:

1) The caveat to this solution is that you must be using the MyISAM engine for the events table. If you cannot use MyISAM then this solution wont work because only MyISAM is supported for Spatial Indexes.

So, assuming that the above isn't an issue for you, the following should work and give you good performance:

This solution makes use of MySQL's support for Spatial Data (see documentation here). While spatial data types can be added to a variety of storage engines, only MyISAM is supported for Spatial R-Tree Indexes (see documentation here) which are needed in order to get the performance needed. One other limitation is that spatial data types only work with numerical data so you cannot use this technique with string based range queries.

I wont go into the details of the theory behind how spatial types work and how the spatial index is useful but you should look at Jeremy Cole's explanation here in regards to how to use spatial data types and indexes for GeoIP lookups. Also look at the comments as they raise some useful points and alternative if you need raw performance and can give up some accuracy.

The basic premise is that we can take the start/end and use the two of them to create four distinct points, one for each corner of a rectangle centered around 0,0 on a xy grid, and then do a quick lookup into the spatial index to determine if the particular point in time we care about is within the rectangle or not. As mentioned previously, see Jeremy Cole's explanation for a more thorough overview of how this works.

In your particular case we will need to do the following:

1) Alter the table to be a MyISAM table (note you shouldn't do this unless you are fully aware of the consequences of such a change like the lack of transactions and the table locking behavior that are associated with MyISAM).

alter table events engine = MyISAM;

2) Next we add the new column that will hold the spatial data. We will use the polygon data type as we need to be able to hold a full rectangle.

alter table events add column time_poly polygon NOT NULL;

3) Next we populate the new column with the data (please keep in mind that any processes that update or insert into table events will need to get modified to make sure they are populating the new column also). Since the start and end ranges are times, we will need to convert them to numbers with the unix_timestamp function (see documentation here for how it works).

update events set time_poly := LINESTRINGFROMWKB(LINESTRING(
    POINT(unix_timestamp(start_time), -1),
    POINT(unix_timestamp(end_time), -1),
    POINT(unix_timestamp(end_time), 1),
    POINT(unix_timestamp(start_time), 1),
    POINT(unix_timestamp(start_time), -1)
  ));

4) Next we add the spatial index to the table (as mentioned previously, this will only work for a MyISAM table and will produce the error "ERROR 1464 (HY000): The used table type doesn't support SPATIAL indexes").

alter table events add SPATIAL KEY `IXs_time_poly` (`time_poly`);

5) Next you will need to use the following select in order to make use of the spatial index when querying the data.

SELECT * 
FROM events force index (IXs_time_poly)
WHERE MBRCONTAINS(events.time_poly, POINTFROMWKB(POINT(unix_timestamp('2009-02-18 16:27:12'), 0)));

The force index is there to make 100% sure that MySQL will use the index for the lookup. If everything went well running an explain on the above select should show something similar to the following:

mysql> explain SELECT *
    -> FROM events force index (IXs_time_poly)
    -> on MBRCONTAINS(events.time_poly, POINTFROMWKB(POINT(unix_timestamp('2009-02-18 16:27:12'), 0)));
+----+-------------+-------+-------+---------------+---------------+---------+------+------+-------------+
| id | select_type | table | type  | possible_keys | key           | key_len | ref  | rows | Extra       |
+----+-------------+-------+-------+---------------+---------------+---------+------+------+-------------+
|  1 | SIMPLE      | B     | range | IXs_time_poly | IXs_time_poly | 32      | NULL |    1 | Using where | 
+----+-------------+-------+-------+---------------+---------------+---------+------+------+-------------+
1 row in set (0.00 sec)

Please refer to Jeremy Cole's analysis for details about the performance benefits of this method as compared with a between clause.

Let me know if you have any questions.

Thanks,

-Dipin

Dipin
Very interesting solution, the performance numbers in the linked article are impressive.
Chad Birch
Your explanation is a treasure and the links provided very useful. I did try to read about r-tree indexes but got confused and gave up. Thank you.
daremon
A: 

I was going to ask a similar question on optimizing searches for events (items with a start & stop time), and I'm already using a different approach, so I'll throw it out there.

Basically, if you know that your events are never larger than a given duration, you can search for a bounded range that's larger than the max duration, then add restrictions to get rid of the extra stuff that matched. So, to get times that intersect with the search time:

SELECT *
FROM events
WHERE 
   ( start_time BETWEEN ( 'search_start' - INTERVAL 2 DAY ) and 'search_end' )
   AND end_time >= 'search_start'

... you'll want to have an index on start_time.

(Note -- my table has millions of events spread over 4 years, with no record more than 24hrs ... I have no idea how this performs relative to the spatial search approach, as I'm going to have to go try that myself.)

Joe