views:

33

answers:

2

Let's say I have a very large MySQL table with a timestamp field. So I want to filter out some of the results not to have too many rows because I am going to print them.

Let's say the timestamps are increasing as the number of rows increase and they are like every one minute on average. (Does not necessarily to be exactly once every minute, ex: 2010-06-07 03:55:14, 2010-06-07 03:56:23, 2010-06-07 03:57:01, 2010-06-07 03:57:51, 2010-06-07 03:59:21 ...)

As I mentioned earlier I want to filter out some of the records, I do not have specific rule to do that, but I was thinking to filter out the rows according to the timestamp interval. After I achieve filtering I want to have a result set which has a certain amount of minutes between timestamps on average (ex: 2010-06-07 03:20:14, 2010-06-07 03:29:23, 2010-06-07 03:38:01, 2010-06-07 03:49:51, 2010-06-07 03:59:21 ...)

Last but not least, the operation should not take incredible amount of time, I need this functionality to be almost fast as a normal select operation.

Do you have any suggestions?

+1  A: 

I wasn't able to come up with a query that would do this off the top of my head, but here's what I was thinking:

  1. If you have a lot of entries within a single minute, figure out a way to collapse the results such that there is max 1 entry for any given minute (DISTINCT, DATE_FORMAT maybe?).

  2. Limit the number of results by using modulus on the minute value, something like this (if you'd like an entry from every 10 minutes):

WHERE MOD(MINUTE(tstamp_column, 10)) = 0

Lauri Lehtinen
I guess those are the only approaches, possible.
celalo
+1  A: 

If your goal is to filter records, presumably what you really want is a small percentage of the records, but not the first 10 or 100. In which case, which not just select them randomly? The MySQL RAND() function will return a floating point number n, such that 0 <= n < 1.0. Convert your desired percentage to a floating point number, and use it like this:

SELECT * FROM table
WHERE RAND() < 0.001

If you want repeatable results (for testing), you can use a seed parameter to force the function to always return the same set of numbers.

Craig Trader