tags:

views:

616

answers:

3

I have fairly long and complex SQL query that is run against PostgreSQL 8.3. Part of the query involves filtering on a range of dates ending with today, like this:

where ...
  and sp1.price_date between current_date::date - '1 year'::interval and current_date::date
  and sp4.price_date between current_date::date - '2 weeks'::interval and current_date::date
  and sp5.price_date = (select sp6.price_date 
                          from stock_prices sp6 
                         where sp6.stock_id = s.stock_id 
                           and sp6.price_date < current_date::date 
                      order by sp6.price_date desc 
                         limit 1)
  ...

This query takes almost 5 minutes to run (the first time) and about 1.5 minutes the second time. From looking at the EXPLAIN ANALYZE output it seems that current_date is the problem. So I tried replacing it with a hardcoded date, like this:

where ...
  and sp1.price_date between '2009-09-30'::date - '1 year'::interval and '2009-09-30'::date
  and sp4.price_date between '2009-09-30'::date - '2 weeks'::interval and '2009-09-30'::date
  and sp5.price_date = (select sp6.price_date 
                          from stock_prices sp6 
                         where sp6.stock_id = s.stock_id 
                           and sp6.price_date < '2009-09-30'::date 
                      order by sp6.price_date desc 
                         limit 1)
  ...

The query then ran in half a second! That's great, except that the date occurs in a total of 10 places in the query and, of course, I don't want the user to have to manually change it in 10 places. In MS SQL Server I would simply declare a variable with the value of the current date and use that, but apparently that's not possible in plain SQL in Postgres.

What can I do to make this query run fast while automatically using the current date?

+2  A: 

First of all, post EXPLAIN ANALYZE on both variants so we can see. First step in figuring our why one is slower than the other. Might be useful to see the entire query as well.

The first variant should be optimizable.

As to not having your user change your query at multiple places consider writing a stored procedure, or if/when your first variant is optimized, a view.

Edit: Noticed that your current__date - '...'::interval would return a timestamp w/o timezone. I assume you want to cast to date instead: (current_date - '2 weeks'::interval)::date

Sorry, I did do "EXPLAIN ANALYZE", not "EXPLAIN" (edited). I'm not sure how else to optimise it, though.I tried writing a function that "RETURNS SETOF RECORD" to do the query and using a parameter in that function, but the function didn't return after 10 minutes, at which point I canceled it.
Evgeny
@Evgeny: Running explain analyze on itself doesn't help. You have to read the output of it and find the problem.
depesz
@Evgeny: Perhaps I was a bit unclear. Please post the results of the explain analyze here.
+1  A: 

EDIT: the following was tested but it ran even slower than the original query!. The lesson about this is possibly that the performance hit is incurred by all the typecasting (::date, ::interval etc.). Maybe these explicit casts can be somehow replaced by something else, and also some of the expression such as 'D.RightNow::date - '1 year'::interval' be pre-calculated.

-- original reply--
You could insert the current date in an otherwise empty table and join on that table...

In other words, assuming such a table is created and named tblNow, the query(ies) with date-related filters could become something like this:

UPDATE tblNow SET RightNow = TIMEOFDAY();  
-- note: above could use CURRENT_DATE  or whatever date function matches the
-- semantics of the date fields in other tables.

-- and now the original query can become

from ...
join tblNow AS D ON 1=1 -- added join
                   -- then all current_date::date below changed to D.RightNow

where ...
  and sp1.price_date between D.RightNow::date - '1 year'::interval and D.RightNow::date
  and sp4.price_date between D.RightNow::date - '2 weeks'::interval and D.RightNow::date
  and sp5.price_date = (select sp6.price_date 
                          from stock_prices sp6 
                         where sp6.stock_id = s.stock_id 
                           and sp6.price_date < D.RightNow::date 
                      order by sp6.price_date desc 
                         limit 1)
  ...

Effectively requiring no editing to the query each time we wish to run it for the current moment. I'm not familiar with postgreSQL's but this approach would be a natural workaround for any limitation about using variables in SELECT statements.

mjv
How would this help him? Instead of changing the query, he has to add/update rows in a separate table?
@etlerant : the OP mentions having to change the date in 10+ places in the query(ies), this approach would allow 1) a single location and 2) No changes to the query at all, if an UPDATE query using timeofday() precedes the work-related queries.
mjv
Thanks, I tried this, but it didn't help - the query ran for 10 minutes before I gave up and canceled it.
Evgeny
@Evgeny. Sorry I took you on the wrong path. I edited my reply accordingly to record this insight (of being so slow). Maybe the slowness is tied to the numerous typecasts, and also maybe we can help by storing in tblNow's only record additional fields where some of the date expression could be pre-calculated.
mjv
A: 

To be able to tell anything about the problem you have to do explain analyze of both queries, and compare outputs. If you don't know what to do - just put them (the explain analyze outputs) on explain.depesz.com, and tell us the links to plans.

depesz