views:

56

answers:

3

First of all I am using Oracle 10g Express

So there are three columns I want to select:

[domain_name] [index_path] [collection_name]

Now there are two columns that I want to be unique (as a group):

[domain_name] [index_path]

And then I want to select the row baised on when another column [gen_timestamp] is most recent.

So my issue is how do I basically:

SELECT domain_name, index_path, MIN(collection_name) collection_name
FROM TABLENAMEHERE
GROUP BY domain_name, index_path;

but instead of selecting the min collection_name, select the row were [gen_timestamp] is the most recent.


To clarify a few questions I could see people asking:

Do you need a unique value of domain_name, AND a unique value of index_path, or a unique COMBINATION of the two?

unique COMBINATION of the two.

So there are multiple rows of the same [domain_name] [index_path]?

Yes.


This is the code that I am working with now but it doesn't quite work:

select domain_name, index_path, collection_name
  from my_table outr
       inner join 
         (select domain_name, index_path, collection_name, 
                 max(gen_timestamp) 
                    over (partition by domain_name, index_path) gen_timestamp
            from my_table) innr
 where outr.domain_name = innr.domain_name
   and outr.index_path  = innr.index_path
   and outr.collection_name = innr.collection_name
   and outr.gen_timestamp   = innr.gen_timestamp
+2  A: 

This risks duplicates in the event of duplicate gen_timestamp values:

 SELECT x.domain_name, 
        x.index_path, 
        x.collection_name
   FROM TABLENAMEHERE x
   JOIN (SELECT t.domain_name,
                t.index_path,
                MAX(t.gen_timestamp) AS max_ts
           FROM YOUR_TABLE t
       GROUP BY t.domain_name, t.index_path) y ON y.domain_name = x.domain_name
                                              AND y.index_path = x.index_path
                                              AND y.max_ts = x.gen_timestamp
ORDER BY domain_name, index_path

Using ROW_NUMBER (9i+), no risk of duplicates:

WITH summary AS (
  SELECT t.domain_name,
         t.index_path,
         t.collection_name,
         ROW_NUMBER() OVER(PARTITION BY t.domain_name,
                                        t.index_path
                               ORDER BY t.gen_timestamp DESC) AS rank
    FROM YOUR_TABLE t)
  SELECT s.domain_name,
         s.index_path,
         s.collection_name
    FROM summary s
   WHERE s.rank = 1
ORDER BY domain_name, index_path
OMG Ponies
that selects the actual time stamp while I want to select the collection name that the timestamp refers to. Something like what I just edited to question.
Jacob Nelson
@jacobnlsn: So you want the `collection_name` value associated with the highest `gen_timestamp` per domain/path pair--correct?
OMG Ponies
@OMG Ponies: I want the collection_name, domain_name and index_path values associated with he highest gen_timestamp per domain/path pair. So you were very close.
Jacob Nelson
@jacobnlsn: Understood, updated answer.
OMG Ponies
@OMG Ponies: You are a hero, the 2nd query works amazingly!
Jacob Nelson
A: 
select distinct domain_name, 
                index_path, 
                first(collection_name) over (partition by domain_name, index_path order by gen_timestamp desc) 
from Your_Table
Allan
Pretty sure you need PARTITION BY in the analytic, or it'll just be the first collection_name with the highest timestamp value...
OMG Ponies
@OMG Ponies: You're right, of course.
Allan
+1  A: 

There is an aggregate function available since version 9 that does exactly what you are asking for. Unfortunately I haven't seen this one mentioned in the responses in your two threads yet.

A table to demonstrate your problem:

SQL> create table tablenamehere (domain_name,index_path,collection_name,gen_timestamp)
  2  as
  3  select 'A', 'Z', 'a collection name', systimestamp from dual union all
  4  select 'A', 'Z', 'b collection name', systimestamp - 1 from dual union all
  5  select 'A', 'Y', 'c collection name', systimestamp from dual union all
  6  select 'B', 'X', 'd collection name', systimestamp - 2 from dual union all
  7  select 'B', 'X', 'e collection name', systimestamp - 4 from dual union all
  8  select 'B', 'X', 'f collection name', systimestamp from dual
  9  /

Table created.

And your query which shows min(collection_name). This is showing "d collection name", but you want it to show "f collection name":

SQL> SELECT domain_name, index_path, MIN(collection_name) collection_name
  2  FROM TABLENAMEHERE
  3  GROUP BY domain_name, index_path
  4  /

D I COLLECTION_NAME
- - -----------------
A Y c collection name
A Z a collection name
B X d collection name

3 rows selected.

No need to apply analytic functions to all your rows and filter on those results: you are doing an aggregation and the LAST function does your job exactly. Here is a link to the documentation: http://download.oracle.com/docs/cd/B19306_01/server.102/b14200/functions071.htm#sthref1495

SQL> select domain_name
  2       , index_path
  3       , max(collection_name) keep (dense_rank last order by gen_timestamp) collection_name
  4    from tablenamehere
  5   group by domain_name
  6       , index_path
  7  /

D I COLLECTION_NAME
- - -----------------
A Y c collection name
A Z a collection name
B X f collection name

3 rows selected.

Regards, Rob.

Rob van Wijk