views:

71

answers:

3

Hi, I have following SQL query that selects some results from my table:

select  avg(c3), count(c3), std  
from ssims where obraz = 'lena' group by std order by std

But I have various number of tests performed for different values of std, so it returns me something like that:

0.906176136363636;44;5
0.881669302325582;43;10
0.855873409090909;44;15
0.829195813953488;43;20
0.802071590909091;44;25
0.774523720930233;43;30
0.747213636363636;44;35
0.720115581395349;43;40
0.694712954545455;44;45
0.668683255813953;43;50

What I would like to do is to select average of constant (i.e. 20) number of results for every std value. So after such query, the second column would be 20 for each row.

How to do it? I tried limits and top, but without success

A: 

Assuming your ssims table has a unique id column which I have called id in my example you could do the following:

select avg(c3), count(c3), std from ssims where id in 
   (select id from ssims where obraz = 'lena' LIMIT 20)
   group by std order by std;
Tendayi Mawushe
Yes, it has such column, but this doesn't work. There is an error because std must be also selected in the subquery while gouping. Error message:ERROR: column "ssims.id" must appear in the GROUP BY clause or be used in an aggregate function
Gacek
You can remove the group by statement from the inner select and apply to the outer select as I have now done. Depending on the results you need you could also just add id to the group by statement in the inner select.
Tendayi Mawushe
nope, it will select only 20 elements, not 20 elements for each std value
Gacek
Ah yes I misunderstood your question.
Tendayi Mawushe
A: 

If you're on 8.4, you should be able to do that with a window function. (not sure what the std part is, but I'm sure you can add that back) Something like this (untested, so you may need to adjust some things):

SELECT std,avg(c3), count(c3)
FROM (
 SELECT std, c3, row_number() OVER (
  PARTITION BY std ORDER BY random())
 ) foo
WHERE row_number <= 20
GROUP BY std
ORDER BY std

If you don't care that you actually get a random subset, you can remove the ORDER BY random() part, and it'll give you a "almost-random" one.

Magnus Hagander
It still returns an error (no std column), even in 8.4
Gacek
Uh, I see now that the query is missing a "FROM ssims". It needs to have that before the ") foo" part. That's what I meant with "you may need to adjust some things"
Magnus Hagander
+5  A: 

In PostgreSQL 8.3:

SELECT  a[1] AS avg_std, a[2] AS cnt_std, std
FROM    (
        SELECT  (
                SELECT  ARRAY[AVG(c3) , COUNT(*)]
                FROM    (
                        SELECT  c3
                        FROM    ssims si
                        WHERE   obraz = 'lena'
                                AND si.std = so.std
                        ORDER BY
                                id
                        LIMIT 20
                        ) q
                ) a
        FROM    (
                SELECT  DISTINCT std
                FROM    ssims
                WHERE   obraz = 'lena'
                ) so
        ) q

This will count both AVG and COUNT in a single index scan for each std.

Create a composite index on (obraz, std, id) for this to work fast.

In PostgreSQL 8.4:

SELECT  AVG(c3), COUNT(*), std
FROM    (
        SELECT  std, c3, ROW_NUMBER() OVER (PARTITION BY std ORDER BY id) AS rn
        FROM    ssims
        WHERE   obraz = 'lena'
        ) q
WHERE   rn <= 20
GROUP BY
        std
Quassnoi
The 8.4 version Works pretty as a charm! Thank you!
Gacek