tags:

views:

89

answers:

1

The query is the following:

with t
as (
  select 450 id, null txt , 3488 id_usr from dual union all
  select 449   , null     , 3488        from dual union all
  select  79   , 'A'      , 3488        from dual union all
  select  78   , 'X'      , 3488        from dual 
)
select id
     , txt
     , id_usr
     , first_value(txt ignore nulls) over (partition by id_usr order by id desc) first_one
  from t

And returns:

ID  TXT     D_USR   FIRST_ONE
450         3488    
449         3488    
79  A       3488    A
78  X       3488    A

This was the expected:

ID  TXT     ID_USR  FIRST_ONE
450         3488    A
449         3488    A
79  A       3488    A
78  X       3488    A

What's wrong and why?

+4  A: 

Default RANGE / ROWS for FIRST_VALUE (as for any other analytical function) is BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW.

If you add IGNORE NULLS, then NULL values are not taken into account when building the range.

The RANGE becomes BETWEEEN UNBOUNDED PRECEDING AND CURRENT ROW EXCEPT FOR THE NULL ROWS (it's not a valid OVER clause).

Since your txt's that are NULL have high id's, they are selected first, and their ranges are empty, since there are no non-NULL rows between them and UNBOUNDED PRECEDING

You should change either ORDER BY or RANGE clause of your query.

Changing ORDER BY puts the rows with NULL id's to the end of the window so that a non-NULL value (if any) will be always selected first, and the RANGE will guaranteedly start from that value:

with t
as (
  select 450 id, null txt , 3488 id_usr from dual union all
  select 449   , null     , 3488        from dual union all
  select  79   , 'A'      , 3488        from dual union all
  select  78   , 'X'      , 3488        from dual 
)
select id
     , txt
     , id_usr
     , first_value(txt) over (partition by id_usr order by NVL2(TXT, NULL, id) DESC) first_one
  from t

Changing RANGE redefines range to include all non-NULL rows in the partition:

with t
as (
  select 450 id, null txt , 3488 id_usr from dual union all
  select 449   , null     , 3488        from dual union all
  select  79   , 'A'      , 3488        from dual union all
  select  78   , 'X'      , 3488        from dual 
)
select id
     , txt
     , id_usr
     , first_value(txt IGNORE NULLS) over (partition by id_usr order by id DESC RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) first_one
  from t
Quassnoi
Do you mean that the "ignore nulls" clause is not for determinig the "first_value" but it's for the windowing?
FerranB
@FerranB: not exactly. It selects the first non-`NULL` value from the range sorted by `id`. If you replace `450` with `1` in your example, it will select 'A' for it.
Quassnoi
I'll better reword it, it's confusing now.
Quassnoi
I think the point about RANGE is the more fundamental issue.
Dave Costa
@Dave: they both matter. It would select `NULL` as the `FIRST_VALUE` if not for the `IGNORE NULLS`.
Quassnoi
"When you specify IGNORE NULLS, the function, well, ignores NULLs." Yes, but only to determine the FIRST_VALUE not the windowing. This have no sense for this question. The problem is that, as you say, the windowing started on current row not on all the partition rows.
FerranB
`@FerranB`: with `IGNORE NULLS` set and `RANGE / ROWS` omitted, the window starts on the first non-`NULL` row and ends on the current row.
Quassnoi