views:

49

answers:

3

Hi.

I'm using Django ORM and postgresql.

ORM creates a query:

SELECT 
    (date_part('month', stat_date)) AS "stat_date", 
    "direct_keywordstat"."banner_id", 
    SUM("direct_keywordstat"."total") AS "total", 
    SUM("direct_keywordstat"."clicks") AS "clicks", 
    SUM("direct_keywordstat"."shows") AS "shows" 

FROM "direct_keywordstat" 

LEFT OUTER JOIN "direct_banner" ON ("direct_keywordstat"."banner_id" = "direct_banner"."banner_ptr_id") 
LEFT OUTER JOIN "platforms_banner" ON ("direct_banner"."banner_ptr_id" = "platforms_banner"."id") 

WHERE (
     "direct_keywordstat".stat_date BETWEEN E'2009-08-25' AND E'2010-08-25' AND
    "direct_keywordstat"."keyword_id" IN (

        SELECT U0."id" 
        FROM "direct_keyword" U0 
        INNER JOIN "direct_banner" U1 ON (U0."banner_id" = U1."banner_ptr_id") 
        INNER JOIN "platforms_banner" U2 ON (U1."banner_ptr_id" = U2."id") 
        INNER JOIN "platforms_campaign" U3 ON (U2."campaign_id" = U3."id")
        INNER JOIN "direct_campaign" U4 ON (U3."id" = U4."campaign_ptr_id")
        WHERE (
            U0."deleted" = E'False' AND 
            U0."low_ctr" = E'False' AND 
            U4."status_active" = E'True' AND 
            U0."banner_id" IN (

                SELECT U0."banner_ptr_id" 
                FROM "direct_banner" U0 
                INNER JOIN "platforms_banner" U1 
                ON (U0."banner_ptr_id" = U1."id") 
                WHERE (
                    U0."status_show" = E'True' AND 
                    U1."campaign_id" = E'174' )
            )
        )
    )
) 

GROUP BY 
    "direct_keywordstat"."banner_id", 
    (date_part('month', stat_date)), 
    "platforms_banner"."title", date_trunc('month', stat_date) 

ORDER BY "platforms_banner"."title" ASC, "stat_date" ASC

Problem is, direct_keywordstat contains 3mln+ records, so the query executes in ~15 seconds.

I've tried creating indexes like

CREATE INDEX direct_keywordstat_stat_date on direct_keywordstat using btree(stat_date);

But EXPLAIN ANALYZE show that index is not used.

Table schema:

\d direct_keywordstat

                                   Table "public.direct_keywordstat"
   Column    |          Type          |                            Modifiers                            
-------------+------------------------+-----------------------------------------------------------------
 id          | integer                | not null default nextval('direct_keywordstat_id_seq'::regclass)
 keyword_id  | integer                | not null
 banner_id   | integer                | not null
 campaign_id | integer                | not null
 stat_date   | date                   | not null
 region_id   | integer                | not null
 place_type  | character varying(30)  | 
 place_name  | character varying(100) | 
 clicks      | integer                | not null default 0
 shows       | integer                | not null default 0
 total       | numeric(19,6)          | not null

How can i create useful index?

Or, maybe, there's a chance to optimize this query other way?

Thing is, if WHERE looks like

"direct_keywordstat".clicks BETWEEN 10 AND 3000000

query executes in 0.8 seconds.

A: 

We need the explain analyze.

Joshua D. Drake
A: 
                                                                                                                       QUERY PLAN                                                                                                                       
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 GroupAggregate  (cost=727967.61..847401.71 rows=2514402 width=67) (actual time=22010.522..23408.262 rows=5 loops=1)
   ->  Sort  (cost=727967.61..734253.62 rows=2514402 width=67) (actual time=21742.365..23134.748 rows=198978 loops=1)
         Sort Key: platforms_banner.title, (date_part('month'::text, (direct_keywordstat.stat_date)::timestamp without time zone)), direct_keywordstat.banner_id, (date_trunc('month'::text, (direct_keywordstat.stat_date)::timestamp with time zone))
         Sort Method:  external merge  Disk: 20600kB
         ->  Hash Join  (cost=1034.02..164165.25 rows=2514402 width=67) (actual time=5159.538..14942.441 rows=198978 loops=1)
               Hash Cond: (direct_keywordstat.keyword_id = u0.id)
               ->  Hash Left Join  (cost=365.78..117471.99 rows=2514402 width=71) (actual time=26.672..13101.294 rows=2523151 loops=1)
                     Hash Cond: (direct_keywordstat.banner_id = direct_banner.banner_ptr_id)
                     ->  Seq Scan on direct_keywordstat  (cost=0.00..76247.17 rows=2514402 width=25) (actual time=8.892..9386.010 rows=2523151 loops=1)
                           Filter: ((stat_date >= '2009-08-25'::date) AND (stat_date <= '2010-08-25'::date))
                     ->  Hash  (cost=324.86..324.86 rows=3274 width=50) (actual time=17.754..17.754 rows=2851 loops=1)
                           ->  Hash Left Join  (cost=209.15..324.86 rows=3274 width=50) (actual time=10.845..15.385 rows=2851 loops=1)
                                 Hash Cond: (direct_banner.banner_ptr_id = platforms_banner.id)
                                 ->  Seq Scan on direct_banner  (cost=0.00..66.74 rows=3274 width=4) (actual time=0.004..1.196 rows=2851 loops=1)
                                 ->  Hash  (cost=173.51..173.51 rows=2851 width=50) (actual time=10.683..10.683 rows=2851 loops=1)
                                       ->  Seq Scan on platforms_banner  (cost=0.00..173.51 rows=2851 width=50) (actual time=0.004..3.576 rows=2851 loops=1)
               ->  Hash  (cost=641.44..641.44 rows=2144 width=4) (actual time=30.420..30.420 rows=106 loops=1)
                     ->  HashAggregate  (cost=620.00..641.44 rows=2144 width=4) (actual time=30.162..30.288 rows=106 loops=1)
                           ->  Hash Join  (cost=407.17..614.64 rows=2144 width=4) (actual time=16.152..30.031 rows=106 loops=1)
                                 Hash Cond: (u0.banner_id = u1.banner_ptr_id)
                                 ->  Nested Loop  (cost=76.80..238.50 rows=6488 width=16) (actual time=8.670..22.343 rows=106 loops=1)
                                       ->  HashAggregate  (cost=76.80..76.87 rows=7 width=8) (actual time=0.045..0.047 rows=1 loops=1)
                                             ->  Nested Loop  (cost=0.00..76.79 rows=7 width=8) (actual time=0.033..0.036 rows=1 loops=1)
                                                   ->  Index Scan using platforms_banner_campaign_id on platforms_banner u1  (cost=0.00..22.82 rows=7 width=4) (actual time=0.019..0.020 rows=1 loops=1)
                                                         Index Cond: (campaign_id = 174)
                                                   ->  Index Scan using direct_banner_pkey on direct_banner u0  (cost=0.00..7.70 rows=1 width=4) (actual time=0.009..0.011 rows=1 loops=1)
                                                         Index Cond: (u0.banner_ptr_id = u1.id)
                                                         Filter: u0.status_show
                                       ->  Index Scan using direct_keyword_banner_id on direct_keyword u0  (cost=0.00..23.03 rows=5 width=8) (actual time=8.620..22.127 rows=106 loops=1)
                                             Index Cond: (u0.banner_id = u0.banner_ptr_id)
                                             Filter: ((NOT u0.deleted) AND (NOT u0.low_ctr))
                                 ->  Hash  (cost=316.84..316.84 rows=1082 width=8) (actual time=7.458..7.458 rows=403 loops=1)
                                       ->  Hash Join  (cost=227.00..316.84 rows=1082 width=8) (actual time=3.584..7.149 rows=403 loops=1)
                                             Hash Cond: (u1.banner_ptr_id = u2.id)
                                             ->  Seq Scan on direct_banner u1  (cost=0.00..66.74 rows=3274 width=4) (actual time=0.002..1.570 rows=2851 loops=1)
                                             ->  Hash  (cost=213.48..213.48 rows=1082 width=4) (actual time=3.521..3.521 rows=403 loops=1)
                                                   ->  Hash Join  (cost=23.88..213.48 rows=1082 width=4) (actual time=0.715..3.268 rows=403 loops=1)
                                                         Hash Cond: (u2.campaign_id = u3.id)
                                                         ->  Seq Scan on platforms_banner u2  (cost=0.00..173.51 rows=2851 width=8) (actual time=0.001..1.272 rows=2851 loops=1)
                                                         ->  Hash  (cost=22.95..22.95 rows=74 width=8) (actual time=0.345..0.345 rows=37 loops=1)
                                                               ->  Hash Join  (cost=11.84..22.95 rows=74 width=8) (actual time=0.133..0.320 rows=37 loops=1)
                                                                 Hash Cond: (u3.id = u4.campaign_ptr_id)
                                                                 ->  Seq Scan on platforms_campaign u3  (cost=0.00..8.91 rows=391 width=4) (actual time=0.006..0.098 rows=196 loops=1)
                                                                 ->  Hash  (cost=10.91..10.91 rows=74 width=4) (actual time=0.117..0.117 rows=37 loops=1)
                                                                       ->  Seq Scan on direct_campaign u4  (cost=0.00..10.91 rows=74 width=4) (actual time=0.004..0.097 rows=37 loops=1)
                                                                             Filter: status_active

Total runtime: 23436.715 ms (47 rows)

Here it is

gistart
A: 

Do you have indexes on these columns:

direct_banner.banner_ptr_id
direct_keywordstat.banner_id
direct_keywordstat.stat_date

Both columns in direct_keywordstat could be combined in a single index, just check

This is also a problem:

Sort Method: external merge Disk: 20600kB

Check your settings for work_mem, you need at least 20MB for this query.

Frank Heikens
"Do you have indexes on these columns" -yep"Check your settings for work_mem" - it was default 1MB. Setting 20 or 40MB didn't help.
gistart
How many records to you have BETWEEN E'2009-08-25' AND E'2010-08-25' ? And drop the subqueries in the WHERE, push them to the JOIN's as well, just like the other joins.
Frank Heikens
thanks Frank!i realized, that the query was all wrong) it was possible to add more filtering conditions in WHERE.the project is not mine, so it took time to find that kind of mistake.
gistart