tags:

views:

83

answers:

4

Is there a neat SQL query that would return rows so that only first occurrences of rows, that have same data in the first column, would be returned? That is, if I have rows like

blah something
blah somethingelse
foo blah
bar blah
foo hello

the query should give me the first, third and fourth rows (because first row is the first occurrence of "blah" in the first column", third row is the first occurrence of "foo" in the first column, and fourth row is the first occurrence of "bar" in the first column).

I'm using H2 database engine, if that matters.

Update: sorry about the unclear table definition, here's it better; the "blah", "foo" etc. denote the value of the first column in the row.

blah [rest of columns of first row]
blah [rest of columns of second row]
foo  [-""- third row]
bar  [-""- fourth row]
foo  [-""- fifth row]
+1  A: 

I think this does what you want but I'm not 100% sure. (Based on MS SQL Server too.)

create table #t
(
PKCol int identity(1,1),
Col1 varchar(200)
)

Insert Into #t
Values ('blah something')
Insert Into #t
Values ('blah something else')
Insert Into #t
Values ('foo blah')
Insert Into #t
Values ('bar blah')
Insert Into #t
Values ('foo hello')


Select t.*
From #t t
Join (
     Select min(PKCol) as 'IDToSelect'
     From #t
     Group By Left(Col1, CharIndex(space(1), col1))
)q on t.PKCol = q.IDToSelect

drop table #t
Barry
+2  A: 

If you meant alphabetically on column 2, here is some SQL to get those rows:

create table #tmp (
    c1 char(20),
    c2 char(20)
)
insert #tmp values ('blah','something')
insert #tmp values ('blah','somethingelse')
insert #tmp values ('foo','ahhhh')
insert #tmp values ('foo','blah')
insert #tmp values ('bar','blah')
insert #tmp values ('foo','hello')

select c1, min(c2) c2 from #tmp
group by c1
Jonathan
+2  A: 

Analytic request could do the trick.

Select *
from (
    Select rank(c1) over (partition by c1) as myRank, t.*
    from myTable t )
where myRank = 1

But this is only a priority 2 for the V1.3.X

http://www.h2database.com/html/roadmap.html?highlight=RANK&search=rank#firstFound

Scorpi0
+1  A: 

If you are interested in the fastest possible query: It's relatively important to have an index on the first column of the table. That way the query processor can scan the values from that index. Then, the fastest solution is probably to use an 'outer' query to get the distinct c1 values, plus an 'inner' or nested query to get one of the possible values of the second column:

drop table test;
create table test(c1 char(20), c2 char(20));
create index idx_c1 on test(c1);

-- insert some data (H2 specific)
insert into test select 'bl' || (x/1000), x from system_range(1, 100000); 

-- the fastest query (64 ms)
select c1, (select i.c2 from test i where i.c1=o.c1 limit 1) from test o group by c1;

-- the shortest query (385 ms)
select c1, min(c2) c2 from test group by c1;
Thomas Mueller