views:

58

answers:

4

I want to select some rows based on certain criteria, and then take one entry from that set and the 5 rows before it and after it.

Now, I can do this numerically if there is a primary key on the table, (e.g. primary keys that are numerically 5 less than the target row's key and 5 more than the target row's key).

So select the row with the primary key of 7 and the nearby rows: select primary_key from table where primary_key > 2 order by primary_key limit 11;

2
3
4
5
6
-=7=-
8
9
10
11
12

But if I select only certain rows to begin with, I lose that numeric method of using primary keys (and that was assuming the keys didn't have any gaps in their order anyway), and need another way to get the closest rows before and after a certain targeted row.

The primary key output of such a select might look more random and thus less succeptable to mathematical locating (since some results would be filtered, out, e.g. with a where active=1):

10
12
14
15
30
-=34=-
80
83
100
113
125
126
A: 

You could do this utilizing row_number() (available as of 8.4). This may not be the correct syntax (not familiar with postgresql), but hopefully the idea will be illustrated:

SELECT *
FROM (SELECT ROW_NUMBER() OVER (ORDER BY primary_key) AS r, *
      FROM table
      WHERE active=1) t
WHERE 25 < r and r < 35

This will generate a first column having sequential numbers. You can use this to identify the single row and the rows above and below it.

sgriffinusa
+3  A: 

There's a lot of ways to do it if you run two queries with a programming language, but here's one way to do it in one SQL query:

(SELECT * FROM table WHERE id >= 34 WHERE active = 1 ORDER BY id ASC LIMIT 6)
UNION
(SELECT * FROM table WHERE id < 34 WHERE active = 1 ORDER BY id DESC LIMIT 5)
ORDER BY id ASC

This would return the 5 rows above, the target row, and 5 rows below.

wuputah
Simple and effective, and works for so many situations, this is what I have used.
Tchalvak
A: 

If you wanted to do it in a 'relationally pure' way, you could write a query that sorted and numbered the rows. Like:

select (
  select count(*) from employees b
  where b.name < a.name
) as idx, name
from employees a
order by name

Then use that as a common table expression. Write a select which filters it down to the rows you're interested in, then join it back onto itself using a criterion that the index of the right-hand copy of the table is no more than k larger or smaller than the index of the row on the left. Project over just the rows on the right. Like:

with numbered_emps as (
  select (
    select count(*)
    from employees b
    where b.name < a.name
  ) as idx, name
  from employees a
  order by name
)
select b.*
from numbered_emps a, numbered_emps b
where a.name like '% Smith' -- this is your main selection criterion
and ((b.idx - a.idx) between -5 and 5) -- this is your adjacency fuzzy-join criterion

What could be simpler!

I'd imagine the row-number based solutions will be faster, though.

Tom Anderson
+1  A: 

Here's another way to do it with analytic functions lead and lag. It would be nice if we could use analytic functions in the WHERE clause. So instead you need to use subqueries or CTE's. Here's an example that will work with the pagila sample database.

WITH base AS (
    SELECT lag(customer_id, 5) OVER (ORDER BY customer_id) lag, 
      lead(customer_id, 5) OVER (ORDER BY customer_id) lead, 
      c.*
    FROM customer c
    WHERE c.active = 1
    AND c.last_name LIKE 'B%'
) 
SELECT base.* FROM base 
JOIN (
  -- Select the center row, coalesce so it still works if there aren't 
  -- 5 rows in front or behind
  SELECT COALESCE(lag, 0) AS lag, COALESCE(lead, 99999) AS lead 
  FROM base WHERE customer_id = 280
) sub ON base.customer_id BETWEEN sub.lag AND sub.lead

The problem with sgriffinusa's solution is that you don't know which row_number your center row will end up being. He assumed it will be row 30.

Scott Bailey