tags:

views:

65

answers:

3

I have database with few thousends jobs that have a column with value "active" or "inactive". About 5% of all jobs are "inactive". Jobs are added on daily basis (in numbers of hundrends). When existing job is marked as "inactive" we maintain the ratio to accurate (by deleting "inactive" jobs if there are too many of them).

I need to sort all jobs in such way that "inactive" jobs are evenly distributed. Optimally one in every 20 "active" jobs should be an "inactive" one (results are paginated by 20).

How can I achieve that? The database is Postgresql.

A: 
drop sequence rownum1 ;
create temp sequence rownum1;
drop sequence rownum2 ;
create temp sequence rownum2;

select * from(
select job_name, rownum1*20 as myorder from jobs where job_status =0
union
select job_name, rownum2 as myorder from jobs where job_status =1
)
order by myorder desc

This query will make them evenly distributed if they have a proportion of 1 to 20 inactive to active jobs. If you have more than this optimal ratio, you will have a lot of inactive jobs in the top of the query. You need to drop and recreate the rownums sequence everytime you run this query

kurast
A: 

I agree with kurast on the general approach, but a couple of points might be helpful:

  1. I think you need to use the nextval function to get the next value from the sequences, as in

x

SELECT * FROM
  (SELECT *, nextval('rownum1') AS SORT_ORDER
       FROM JOBS
       WHERE JOB_STATUS = 'active'
   UNION ALL
   SELECT *, nextval('rownum2') * 19 AS SORT_ORDER
       FROM JOBS
       WHERE JOB_STATUS = 'inactive')
ORDER BY SORT_ORDER, JOB_STATUS;
  1. I suggest using a UNION ALL in a query like this because you're sure that the tables returned by the subselects will not contain the first rows, so a plain UNION (which effectively does a DISTINCT on the combined data) is not necessary and is likely to run slowly.

I think that the above query will give you one inactive job on each page of 20 jobs, positioned at the end, but you'll have to play with it to be sure.

Note that in Oracle you could use ROWNUM instead of having to mess around with sequences. (Hint, hint, PostgreSQL team?)

Share and enjoy.

Bob Jarvis
A: 

I would try a user-defined function for this.

create table jobs (
    id serial primary key,
    job_status smallint not null default 1,
    job_name text default 'FIXME',
    inserted timestamptz default now()
);
insert into jobs ( job_status,inserted ) 
select case when random()<=0.05 then 0 else 1 end, localtimestamp
from generate_series(1,1000) x(x);

create or replace function get_jobs(p_limit int,p_offset int)
returns setof jobs as $$
declare
    v_limit1 integer;
    v_offset2 integer;
    rec record;
begin
    v_limit1 := p_limit - 1;
    v_offset2 := p_offset / 20;
    for rec in select * from jobs where job_status=1 
        order by inserted desc limit v_limit1 offset p_offset
    loop
        return next rec;
    end loop;
    return query select * from jobs where job_status=0 offset v_offset2 limit 1;
    return;
end;
$$ language plpgsql;

This will list exactly one "inactive" job on last position of the page. The code might require some polishing, but you get the idea.

Hope this helps.

PS. @kurast: your solution does not work in PostgreSQL 8.4.1. It is not even syntactically correct. After correcting syntax, it does not work too. see below.

select * from(
    select *, nextval('rownum1')*20 as myorder from jobs where job_status =0
union
    select *, nextval('rownum2') as myorder from jobs where job_status =1
) subq
order by myorder desc;
filiprem