tags:

views:

344

answers:

4

I am currently trying to optimize a few queries and scripts, and I wonder if there is a way to cut off a certain percentage of a MySQL Result Set?

for example, I have a Query like this:

SELECT TrackingID, OpenTime, FirstPoint
FROM PointsTable

I need the "middle" 74%, ordered by the difference between FirstPointGMT and OpenTimeGMT, which means: I need to cut off the top and bottom 13%.

At the moment, a Script will just get all Rows, calculate and order by duration and then cut them off, but I'd like to do it in an SQL Query if possible to make the script clearer.

I know about the LIMIT Clause, but that does not seem to support relative values like "13 percent".

Can anyone tell if a) if that is actually possible only with SQL and b) how it could be achieved?

MySQL is 5.0.45.

PS: And yes, I know that 13% seems like a very strange number, but that's all part of a bigger process.

+1  A: 

Cutting off a certain percent would mean that the query still have to be evaluated in full, since the amount of results must be known to calculate the percentage.

However, you could save some bandwidth/cpu by not retreiving the unwanted results from the database, but that depends on your database driver.

truppo
For now it's pure MySQL. I am not worried about optimizing CPU time, I just want to move a long, ugly Block of code from a PHP Script into a SQL Script.
Michael Stum
A: 

You won't be able to do it in a single query. It would take at least 3, along these lines:

select sql_calc_found_rows fields from table where conditions limit 0;
select @cnt := found_rows();
select fields from table where conditions limit @cnt*0.13,@cnt-@cnt*0.13

The extra 'sql calc found rows parameter' causes MySQL to determine how many rows would be retrieved if the limit clause was not pleasant, which you then retrieve using the found_rows() function, and use that to re-run the query with proper numerical limits to get your 74%.

+2  A: 

You will have to use two queries for this.

First:

SELECT count(*) as c 
FROM PointsTable;

Then find the following in php:

a = c * 13 / 100
b = c * 74 / 100

Second:

SELECT TrackingID, OpenTime, FirstPoint 
FROM PointsTable
LIMIT a,b;

Remember to replace the a,b by its value.

I don't know of a way to do this in a single query.

Niyaz
+4  A: 

You need to LIMIT the data you are returning by what percentile the returned rows in the resultset are at.

Try this:

SELECT TrackingID, OpenTime, FirstPoint
FROM PointsTable
ORDER BY FirstPointGMT - OpenTimeGMT
LIMIT (13*((SELECT COUNT(*) FROM PointsTable) +1) / 100), 
((SELECT COUNT(*) FROM PointsTable)
 -
 (26*((SELECT COUNT(*) FROM PointsTable) +1) / 100)
)

Explanation of the above

The only tricky thing here is the Limit clause, and its syntax:

> LIMIT start_row,number_of_rows_to_return

Our start_row needs to be the first element after the first record 13/100ths (13% of the way) into the result set. To do that, we use this formula:

rownumber_or_number_of_rows_in_resultset_Npercent_in = 
( Npercent * ( number_of_rows_in_resultset + 1) / 100)

which is (13*((SELECT COUNT(*) FROM PointsTable) +1) / 100).

Our number_of_rows_to_return has to be the total number of rows, minus 26% of the rows (13% from the bottom and 13% from the top), or

(total_number_of_rows - number_of_rows_26percent_of_the_way_in)

Thus the actual calculation is:

( (SELECT COUNT(*) FROM PointsTable)
  - 
  (26*((SELECT COUNT(*) FROM PointsTable) +1) / 100) 
)


EDIT: After looking over Niyaz's response, I realized that the query could be rewritten as:

SELECT TrackingID, OpenTime, FirstPoint
FROM PointsTable
ORDER BY FirstPointGMT - OpenTimeGMT
LIMIT (13*((SELECT COUNT(*) FROM PointsTable) +1) / 100), 
 (74*((SELECT COUNT(*) FROM PointsTable) +1) / 100)

This is cleaner, as it eliminates a third nested query, and should be functionally equivalent. Limit rows, start at the rows 13% from top, and show 74% of the total rows out of the table from that point on.

sheepsimulator