views:

1274

answers:

5

Suppose I have a column of heights -- how can I select all and only those height values that are neither in the top 30% of values nor the bottom 30% of values.

UPDATE:

I'd like the answer for PostgreSQL (or, failing that, MySQL -- I'm using Rails).

A: 

For SQL Server 2005 +

SELECT
    *
FROM
    MyTable M
EXCEPT
SELECT
    *
FROM
    (SELECT TOP 30 PERCENT
        *
    FROM
        MyTable M
    ORDER BY
        Height
    UNION ALL
    SELECT TOP 30 PERCENT
        *
    FROM
        MyTable M
    ORDER BY
        Height DESC) foo
gbn
+1  A: 

for sql server 2005+ you should use the NTILE() function for this.

SELECT *
FROM   (
         SELECT ntile(3) over(order by AddressId) as Percentile, *
         FROM   (
                SELECT top 100 *
                FROM   Person.Address
           ) t
       ) t
where Percentile = 2
Mladen Prajdic
Is the "select top 100 *" subquery a leftover from testing?
Andomar
yes. the top 100 is just to demonstrate which percentile rows are returned.
Mladen Prajdic
+2  A: 
WITH cte AS (
 SELECT *, NTILE(100) OVER (ORDER BY column) as rank
 FROM table)
SELECT * FROM cte WHERE rank BETWEEN 30 and 70
Remus Rusanu
This is Sql Server specific?
Andomar
oracle also has ntile function. no clue about postgres or mysql.with is sql server only but you don't really need it anyway.
Mladen Prajdic
with is a standard construct, Oracle supports it too and PostgreSQL 8.4 will (along with windowing).
araqnid
Istr someone saying SQL Server only supports window functions where you specify "partition by"?
araqnid
You can specify PARTITION BY, but is optional.
Remus Rusanu
A: 

You're asking for PostgresSQL, and that doesn't support NTITLE or TOP X PERCENT.

Without either of those, I can think of a query like this retrieve the middle rows:

select *
from MyTable
where height not in (
    select Height from MyTable order by Height desc 
    limit ((select count(*) from MyTable)*0.3)
    union
    select Height from MyTable order by Height
    limit ((select count(*) from MyTable)*0.3)
)

Now, I'm not sure if PostgresSQL supports a limit calculated in a subquery, and I don't have a PostgresSQL database near to try it.

Andomar
+1  A: 

Hi,

Postgres only accepts contants in limit clause. So the solution above does not work.

Your select is something like this:

SELECT *
  FROM (SELECT T.HEIGHT, 
               -- this tells us the "ranking" of each row 
               -- by counting all the heights that are small than 
               -- height in the that row
               (SELECT COUNT(*) + 1
                  FROM <table> T1 
                 WHERE T1.HEIGHT < T.HEIGHT
               ) AS RANK,
               -- this tells us the count of rows in the table
               (SELECT COUNT(*) 
                  FROM <table> T1
               ) AS REC_COUNT
          FROM <table> T
         ORDER BY T.HEIGHT
       ) T
 -- now just list rows wich ranking is between (not top30) and (not bottom30)
 WHERE T.RANK BETWEEN (T.REC_COUNT*0.30) AND (T.REC_COUNT*0.70)

This is gonna work in any database what accepts subselects (subqueries).

This does not treat equalties in "heights", but it could be done using primary key

SELECT COUNT(*) + 1
  FROM <table> T1 
 WHERE (T1.HEIGHT < T.HEIGHT)
    OR (T1.HEIGHT = T.HEIGHT and T1.PK_FIELD < T.PK_FIELD)

Regards.

Christian Almeida