views:

3448

answers:

6

Given a table A of people, their native language, and other columns C3 .. C10 represented by ...

Table A

PERSON   LANGUAGE   ...
bob      english
john     english
vlad     russian
olga     russian
jose     spanish

How do I construct a query which selects all columns of one row for each distinct language?

Desired Result

PERSON   LANGUAGE   ...
bob      english
vlad     russian
jose     spanish

It doesn't matter to me which row of each distinct language makes the result. In the result above, I chose the lowest row number of each language.

A: 

I'd use the RANK() function in a subselect and then just pull the row where rank = 1.

select person, language
from
( 
    select person, language, rank() over(order by language) as rank
    from table A
    group by person, language
)
where rank = 1
Harper Shelby
+1  A: 

My Oracle is a bit rusty, but I think this would work:

SELECT * FROM TableA
WHERE ROWID IN ( SELECT MAX(ROWID) FROM TableA GROUP BY Language )
Eric Petroelje
Can you GROUP BY a column that's not in the SELECT list?
tekBlues
@tekBlues - sure, why not? :)
Eric Petroelje
@Eric Petroelje: this won't work in Oracle unfortunately. The rownum pseudo-column is calculated after the select. So suppose the inline query returns the rows (2,4,5) : The outer query will never return any row because the first row of a select is always rownum=1 and this will not satisfy {rownum in (2,4,5)}
Vincent Malgrat
@Vincent - My mistake, I should have used ROWID rather than ROWNUM. Corrected my answer to reflect that.
Eric Petroelje
A: 

If it doesn't matter which row to select, I'd go with a simple group by:

SQL> WITH DATA AS (
  2     SELECT 'bob' person, 'english' language FROM dual UNION ALL
  3     SELECT 'john', 'english' FROM dual UNION ALL
  4     SELECT 'vlad', 'russian' FROM dual UNION ALL
  5     SELECT 'olga', 'russian' FROM dual UNION ALL
  6     SELECT 'jose', 'spanish' FROM dual
  7  )
  8  SELECT MIN(person) person, language FROM DATA GROUP BY LANGUAGE;

PERSON LANGUAGE
------ --------
bob    english
jose   spanish
olga   russian
Vincent Malgrat
The problem with this is that there are several other columns in his table, and using MIN() on all of them wouldn't give you a "real" row from the database (just a weirdish composite row).
Eric Petroelje
@Eric: You're right Eric, I didn't understand the requirements completely (the ... part).
Vincent Malgrat
+2  A: 

Eric Petroelje almost has it right:

SELECT * FROM TableA
WHERE ROWID IN ( SELECT MAX(ROWID) FROM TableA GROUP BY Language )

Note: using ROWID (row unique id), not ROWNUM (which gives the row number within the result set)

Joe
+1 good catch! - I've been out of the Oracle world for too long now.
Eric Petroelje
It'll work but it's a self join - the analytic function will do the same job but more efficiently.
Jeffrey Kemp
A: 

For efficiency's sake you want to only hit the data once, as Harper does. However you don't want to use rank() because it will give you ties and further you want to group by language rather than order by language. From there you want add an order by clause to distinguish between rows, but you don't want to actually sort the data. To achieve this I would use "order by null" E.g.

count(*) over (group by language order by null)

Scott Swank
Scott, the `group by` won't work here, the synthax for an analytical query is `partition by`. You could also use `row_number()` instead of `count(*)` (more readable maybe?)
Vincent Malgrat
That's what I get for typing off the top of my head. And you're right on both counts.
Scott Swank
A: 

This will be more efficient, plus you have control over the ordering it uses to pick a value:

SELECT DISTINCT
       FIRST_VALUE(person)
          OVER(PARTITION BY language
               ORDER BY person)
      ,language
FROM   tableA;

If you really don't care which person is picked for each language, you can omit the ORDER BY clause:

SELECT DISTINCT
       FIRST_VALUE(person)
          OVER(PARTITION BY language)
      ,language
FROM   tableA;
Jeffrey Kemp
can this be adapted to select the first value of several other columns as well? I alluded to these as "other columns C3 .. C10 represented by ..." in my question. Thanks!
Ian Cohen
yes. SELECT DISTINCT FIRST_VALUE(colA) OVER (PARTITION BY x), FIRST_VALUE(colB) OVER (PARTITION BY x), ... FROM tableA
Jeffrey Kemp