ansaurus

Question

Answer 1

+1 A:

It looks like Postgres overestimates how many stations are there in vicinity of a city 5182. It thinks there is 1220 but there's only 23.

You can two queries to force getting the stations first, like this (not tested, may need tweeking):

start transaction;
create temporary table s(id int);
insert into s
  select id from
    climate.city c,
    climate.station s
  where
    c.id = 5182 AND
    earth_distance(
      ll_to_earth(c.latitude_decimal,c.longitude_decimal),
      ll_to_earth(s.latitude_decimal,s.longitude_decimal)) / 1000 <= 30 AND
    s.elevation BETWEEN 0 AND 3000 AND
    s.applicable = TRUE;
analyze s;

SELECT
    extract(YEAR FROM m.taken) AS year,
    avg(m.amount) AS amount
  FROM
    climate.station_category sc,
    climate.measurement m,
    s
  WHERE
    sc.category_id = 1 AND
    sc.taken_start >= '1900-01-01'::date AND
    sc.taken_end <= '1996-12-31'::date AND
    m.station_id = sc.station_id AND
    m.taken BETWEEN sc.taken_start AND sc.taken_end AND
    m.category_id = sc.category_id AND
    sc.station_id = s.id
  GROUP BY
    extract(YEAR FROM m.taken)
  ORDER BY
    extract(YEAR FROM m.taken);
rollback;

You can also set enable_seqscan=off for this query. This will force Postgres to avoid sequential scans at all cost.

Tometzky 2010-05-27 18:36:44

On second thought I've rewritten a query to 2 queries using temporary table. This way Postgres cannot overestimate stations. Please try this and tell if it works better.

Tometzky 2010-05-28 06:27:34

@Tometzky: Stations are gathered in a sub-select. The sub-select was tweaked to find the bounding rectangle for the radius. This allows PostgreSQL to eliminate stations using an index, thereby needing to check only those stations that fall within the minimum bounding rectangle to see if they are within the given radius. The large performance increase came from aligning the physical model with the logical model using `CLUSTER`ed indexes.

Dave Jarvis 2010-06-26 20:16:22

Answer 2

A:

The problem was that the station ID was not sequentially distributed in the measurement tables. The solution:

CREATE UNIQUE INDEX measurement_001_stc_index
  ON climate.measurement_001
  USING btree
  (station_id, taken, category_id);
ALTER TABLE climate.measurement_001 CLUSTER ON measurement_001_stc_index;

By forcing a CLUSTER on the columns, the station IDs were aligned physically on disk with the table's natural order. This gave a performance increase of an order of magnitude.

Dave Jarvis 2010-05-27 21:39:20

ansaurus

tags:

views:

answers:

Random Page Cost and Planning

related questions