ansaurus

Question

PostgreSQL: Selecting N result in group by query

Answer 1

A:

Assuming your ssims table has a unique id column which I have called id in my example you could do the following:

select avg(c3), count(c3), std from ssims where id in 
   (select id from ssims where obraz = 'lena' LIMIT 20)
   group by std order by std;

Tendayi Mawushe 2010-01-02 12:19:53

Yes, it has such column, but this doesn't work. There is an error because std must be also selected in the subquery while gouping. Error message:ERROR: column "ssims.id" must appear in the GROUP BY clause or be used in an aggregate function

Gacek 2010-01-02 13:08:52

You can remove the group by statement from the inner select and apply to the outer select as I have now done. Depending on the results you need you could also just add id to the group by statement in the inner select.

Tendayi Mawushe 2010-01-02 13:25:26

nope, it will select only 20 elements, not 20 elements for each std value

Gacek 2010-01-02 13:31:29

Ah yes I misunderstood your question.

Tendayi Mawushe 2010-01-03 16:56:17

Answer 2

A:

If you're on 8.4, you should be able to do that with a window function. (not sure what the std part is, but I'm sure you can add that back) Something like this (untested, so you may need to adjust some things):

SELECT std,avg(c3), count(c3)
FROM (
 SELECT std, c3, row_number() OVER (
  PARTITION BY std ORDER BY random())
 ) foo
WHERE row_number <= 20
GROUP BY std
ORDER BY std

If you don't care that you actually get a random subset, you can remove the ORDER BY random() part, and it'll give you a "almost-random" one.

Magnus Hagander 2010-01-02 15:50:56

It still returns an error (no std column), even in 8.4

Gacek 2010-01-03 15:43:18

Uh, I see now that the query is missing a "FROM ssims". It needs to have that before the ") foo" part. That's what I meant with "you may need to adjust some things"

Magnus Hagander 2010-01-03 21:39:37

Answer 3

+5 A:

In PostgreSQL 8.3:

SELECT  a[1] AS avg_std, a[2] AS cnt_std, std
FROM    (
        SELECT  (
                SELECT  ARRAY[AVG(c3) , COUNT(*)]
                FROM    (
                        SELECT  c3
                        FROM    ssims si
                        WHERE   obraz = 'lena'
                                AND si.std = so.std
                        ORDER BY
                                id
                        LIMIT 20
                        ) q
                ) a
        FROM    (
                SELECT  DISTINCT std
                FROM    ssims
                WHERE   obraz = 'lena'
                ) so
        ) q

This will count both AVG and COUNT in a single index scan for each std.

Create a composite index on (obraz, std, id) for this to work fast.

In PostgreSQL 8.4:

SELECT  AVG(c3), COUNT(*), std
FROM    (
        SELECT  std, c3, ROW_NUMBER() OVER (PARTITION BY std ORDER BY id) AS rn
        FROM    ssims
        WHERE   obraz = 'lena'
        ) q
WHERE   rn <= 20
GROUP BY
        std

Quassnoi 2010-01-02 15:59:54

The 8.4 version Works pretty as a charm! Thank you!

Gacek 2010-01-03 15:42:28

ansaurus

tags:

views:

answers:

PostgreSQL: Selecting N result in group by query

related questions