ansaurus

Question

Expand query beyond that specifed in the WHERE clause

Answer 1

+3 A:

I'd say this:

SELECT 
  o.* 
FROM 
  data_bahf o
WHERE 
  o.param_id = 135321 
  AND o.datetime BETWEEN
  ISNULL(
    (
      SELECT   MAX(datetime) 
      FROM     data_bahf i
      WHERE    i.param_id = 135321 AND i.datetime <= '20090311 10:30:00'
    ),
    '0001-01-01 00:00:00'
  )
  AND
  ISNULL(
    (
      SELECT   MIN(datetime) 
      FROM     data_bahf i
      WHERE    i.param_id = 135321 AND i.datetime >= '20090311 12:00:00'
    ),
    '9999-12-31 23:59:59'
  )

EDIT: Fallback added.
When there is no row matching the sub-query, it will result in a NULL value, which must be caught by ISNULL() or the BETWEEN operator will fail and the main query will return no rows at all.

Tomalak 2009-03-13 18:11:05

It does look better - but it's 10 times slower!

Johan 2009-03-16 10:41:30

Do you have an index on your datetime field? (I take it for granted that you have one on your param_id field.)

Tomalak 2009-03-16 11:01:22

I do, and the query is using it. The times are 10ms for the original query, and 200ms for the new one. In absolute terms this isn't much, but I do have many of these queries to run. There are only about 200k rows.

Johan 2009-03-16 11:39:41

I'm afraid I am no PostgreSQL expert. For 200k rows the performance *should* be very snappy (just tested it on SQL server on a similar amount of live data, it's blazingly fast) Can you post the query execution plans (EXPLAIN) of both variants?

Tomalak 2009-03-16 13:09:29

Postgres cannot cache dependent inner queries, it will reevaluate the on each join. That's why your query is slow. To make it fast, you'll need either to pass the constant into each of the inner queries, or wrap the correlated subqueries as the results of an uncorrelated one.

Quassnoi 2009-03-16 13:37:54

And I thought it would be a clever move to eliminate the constant in the inner query to make sure you only need to change it in one place... I'll change it to constants in the inner queries.

Tomalak 2009-03-16 14:20:21

After making the changes, it now runs at <8ms. Looks better and runs better! Thanks.

Johan 2009-03-16 14:31:12

Actually, it is Quassnoi's expertise that made this possible. I had the right idea, but he knew the missing bits. I think it's fair to accept his answer instead of mine.

Tomalak 2009-03-16 14:35:14

I don't really mind which I accept, so... done.

Johan 2009-03-16 14:42:45

Answer 2

+2 A:

First, make sure that you have a composite index on (param_id, datetime)

Second, query like this:

SELECT  *
FROM    data_bahf
WHERE   param_id = 135321
        AND datetime BETWEEN
        COALESCE(
        (
        SELECT  MAX(datetime)
        FROM    data_bahf
        WHERE   param_id = 135321
              AND datetime <= '2009-01-01 00:00:00'
        ), '0001-01-01')
        AND 
        COALESCE(
        (
        SELECT  MIN(datetime)
        FROM    data_bahf
        WHERE   param_id = 135321
              AND datetime >= '2009-01-02 00:00:00'
        ), '9999-01-01')

Just checked, it runs in 1.215 ms for a sample table of 200,000 rows

Quassnoi 2009-03-16 12:59:02

Can you say why having a composite index is beneficial? (no critique, I'm just wondering)

Tomalak 2009-03-16 13:15:33

Because if you don't have one, you'll need to filter out all non-matching param_id's while searching for MAX and MIN.

Quassnoi 2009-03-16 13:24:41

When you already have an index on "param_id" (which I took as read for this question) I would expect the optimizer to apply this before it is doing MAX() or MIN() on the remaining rows. So I would think having two separate indexes to yield the same performance. (Or not?)

Tomalak 2009-03-16 13:29:49

No, they won't. Optimizer can use either an index on param_id, in which case it will need to scan ALL rows with this param_id to find the MIN datetime; or an index on datetime, in which case it will need to scan all rows in descending order until it finds a first matching param_id

Quassnoi 2009-03-16 13:41:40

When you have a composite index, you will need to find one single leaf to the left of a given (param_id, datetime) pair.

Quassnoi 2009-03-16 13:42:56

Thanks for sharing the insight! :-) +1

Tomalak 2009-03-16 14:19:22

Thanks for the help. I already had the composite index, but wasn't sure if I was over indexing or not as it never seemed to be used in the query plan.

Johan 2009-03-16 14:30:22

ansaurus

tags:

views:

answers:

Expand query beyond that specifed in the WHERE clause

related questions