tags:

views:

2323

answers:

3

The problem I'm trying to solve is that I have a table like this:

a and b refer to point on a different table. distance is the distance between the points.

| id | a_id | b_id | distance | delete |
| 1  |  1   |   1  |  1       |   0    |
| 2  |  1   |   2  |  0.2345  |   0    |
| 3  |  1   |   3  |  100     |   0    |
| 4  |  2   |   1  |  1343.2  |   0    |
| 5  |  2   |   2  |  0.45    |   0    |
| 6  |  2   |   3  |  110     |   0    |
....

The important column I'm looking is a_id. If I wanted to keep the closet b for each a, I could do something like this:

update mytable set delete = 1 from (select a_id, min(distance) as dist from table group by a_id) as x where a_gid = a_gid and distance > dist;
delete from mytable where delete = 1;

Which would give me a result table like this:

| id | a_id | b_id | distance | delete |
| 1  |  1   |   1  |  1       |   0    |
| 5  |  2   |   2  |  0.45    |   0    |
....

i.e. I need one row for each value of a_id, and that row should have the lowest value of distance for each a_id.

However I want to keep the 10 closest points for each a_gid. I could do this with a plpgsql function but I'm curious if there is a more SQL-y way.

min() and max() return the smallest and largest, if there was an aggregate function like nth(), which'd return the nth largest/smallest value then I could do this in similar manner to the above.

I'm using PostgeSQL.

A: 

Does PostgreSQL have the analytic function rank()? If so try:

select a_id, b_id, distance
from
( select a_id, b_id, distance, rank() over (partition by a_id order by distance) rnk
  from mytable
) where rnk <= 10;
Tony Andrews
Postgres doesn't have this functon, but you're on the right track and your suggestion is helping me find the answer.
Rory
+2  A: 

I love postgres, so it took it as a challenge the second I saw this question.

So, for the table:

    Table "pg_temp_29.foo"
 Column |  Type   | Modifiers 
--------+---------+-----------
 value  | integer |

With the values:

 SELECT value FROM foo ORDER BY value;
 value 
-------
     0
     1
     2
     3
     4
     5
     6
     7
     8
     9
    14
    20
    32
(13 rows)

You can do a:

SELECT value FROM foo ORDER BY value DESC LIMIT 1 OFFSET X

Where X = 0 for the highest value, 1 for the second highest, 2... And so forth.

This can be further embedded in a subquery to retrieve the value needed. So, to use the dataset provided in the original question we can get the a_ids with the top ten lowest distances by doing:

SELECT a_id, distance FROM mytable
 WHERE id IN
  (SELECT id FROM mytable WHERE t1.a_id = t2.a_id
   ORDER BY distance LIMIT 10);
 ORDER BY a_id, distance;

 a_id | distance 
------+----------
    1 |   0.2345
    1 |        1
    1 |      100
    2 |     0.45
    2 |      110
    2 |   1342.2
Elijah
Dang, I should have given you an example from the dataset you provided.
Elijah
That won't work, because I want the lowest value for each value of a_id.
Rory
If you remove the DESC (for descending) you will get a list of the lowest values: SELECT value FROM foo ORDER BY value LIMIT 1 OFFSET 1;Anyways, my hope was to demonstrate the principal with a simpler dataset.
Elijah
+2  A: 

Try this:

SELECT  *
FROM    (
    SELECT a_id, (
     SELECT b_id
     FROM mytable mib
     WHERE mib.a_id = ma.a_id
     ORDER BY
      dist DESC
     LIMIT 1 OFFSET s
     ) AS b_id
    FROM (
     SELECT DISTINCT a_id
     FROM mytable mia
     ) ma, generate_series (1, 10) s
    ) ab
WHERE   b_id IS NOT NULL

Checked on PostgreSQL 8.3

Quassnoi