ansaurus

Question

How to get the top N values of each column in MySql

Answer 1

A:

SELECT MAX(column1), MAX(column2), MAX(column3) FROM table LIMIT n

where n = number of times

however, you can only use LIMIT for the whole query, so you will need to pull second column separately.

dusoft 2009-12-16 11:27:46

Answer 2

A:

There might be a nicer way, but you could just nest SELECT statements.

Something like:

SELECT (SELECT column1 FROM table ORDER BY column1 LIMIT 5) a,
       (SELECT column2 FROM table ORDER BY column2 LIMIT 5) b,
       (SELECT column3 FROM table ORDER BY column3 LIMIT 5) c

add "FROM dual" if you are using Oracle.

verhogen 2009-12-16 11:32:52

This is a solution, but is there a shorter way of writing it to avoid having 50+ nested select queries, is this going to be a heavy load to calculate?

Peter 2009-12-16 11:37:19

-1 Subqueries can't return more than one row

Andomar 2009-12-16 11:38:40

@Andomar yes they can. I just tested this. however you need to alias them, which I missed in this post. fixed now.

verhogen 2009-12-16 20:35:56

@Peter i can't think of a better way given your spec. also, yes i think this would be pretty inefficient.

verhogen 2009-12-16 20:43:26

@vigilant: I don't know what brand of RDBMS you tested this in, but in MySQL (the brand used by the OP) it returns `ERROR 1242 (21000): Subquery returns more than 1 row`

Bill Karwin 2009-12-17 16:56:59

Answer 3

A:

Assuming the greatest five values in each column could occur on distinct rows, you have to do this in 50 queries.

Explanation: the expressions in the select-list of an SQL query must refer to the same row. So you could get the top five values of column1 this way:

SELECT column1, column2, column3 FROM table ORDER BY column1 DESC LIMIT 5;

But of course the values of column2 and column3 wouldn't necessarily be the greatest values in those columns, they'd be whatever values happen to be on the same row with the greatest values in column1.

The only way to get values from different rows of the table into a single select-list is to do a self-join:

SET @i1 = 0, @i2 = 0, @i3 = 0, @i4 = 0, @i5 = 0;
SELECT * 
FROM (SELECT @i1:=@i1+1 AS i, column1 FROM table ORDER BY column1 DESC LIMIT 5) t1 ON (t1.i = t5.i)
JOIN (SELECT @i2:=@i2+1 AS i, column2 FROM table ORDER BY column2 DESC LIMIT 5) t2 ON (t1.i = t5.i)
JOIN (SELECT @i3:=@i3+1 AS i, column3 FROM table ORDER BY column3 DESC LIMIT 5) t3 ON (t1.i = t5.i)
JOIN (SELECT @i4:=@i4+1 AS i, column4 FROM table ORDER BY column4 DESC LIMIT 5) t4 ON (t1.i = t5.i)
JOIN (SELECT @i5:=@i5+1 AS i, column5 FROM table ORDER BY column5 DESC LIMIT 5) t5 ON (t1.i = t5.i);

But this isn't practical or efficient for 50 columns.

I also have to wonder about your database design. If you need the top five from 50 columns, could it be that they're all storing the same kind of information? You could be practicing the Multi-Column Attributes antipattern. If so, you should put all 50 columns into a single column in a child table.

Bill Karwin 2009-12-17 17:11:51

Please provide a comment if you downvote.

Bill Karwin 2009-12-18 22:00:19

Answer 4

+1 A:

This is NOT something you want to do in a single query. Just break it out, one query for each column. Under the right circumstances (aka with the right indexes and right column types), MySQL can actually optimize these queries by short circuiting so that it never has to scan the whole table, it just pulls out the top 5 values and done.

SELECT column1 FROM table ORDER BY column1 DESC LIMIT 5
SELECT column2 FROM table ORDER BY column2 DESC LIMIT 5
etc

If you try to smoosh them all together into one giant, kludgy query you'll only manage to convince the optimizer to just give up and rescan the whole table 50 times and then use 50 temp tables and probably some file sorting for good measure. So unless your table has about 10 rows in it (which it obviously doesn't), 50 separate queries will always be faster.

Rob Van Dam 2009-12-23 18:35:40

ansaurus

tags:

views:

answers:

How to get the top N values of each column in MySql

related questions