views:

43

answers:

3

Hi,

I use this query to select products from my database. Performing this query takes ~0.220ms. If I remove the 3 lines where the number_percentage() function is used everything is fast (~0.07ms or less).

SELECT
 DISTINCT ON (p.id) p.id AS product_id,
 pv.price,
 number_percentage(pv.price,t.percentage) AS price_vat,
 number_percentage(pv.price,pd.size) AS price_discount,
 number_percentage(number_percentage(pv.price,t.percentage),pd.size) AS price_discount_vat

FROM
 product_variant AS pv
INNER JOIN
 product AS p ON p.id=pv.product_id
LEFT JOIN
 product_discount AS pd ON pd.id=pv.product_discount_id
LEFT JOIN
 tax AS t ON t.id=pv.tax_id

ORDER BY
 p.id ASC,
 price_discount ASC

LIMIT 15;

This is the number_percentage() function:

CREATE OR REPLACE FUNCTION number_percentage(n real, p int) RETURNS real AS $$
BEGIN
 IF p IS NULL OR p = 0 THEN
  RETURN n;
 END IF;
 RETURN n * (1 + p / 100);
END;
$$ LANGUAGE plpgsql STABLE;

EXPLAIN ANALYZE (with ORDER BY price_discount):

Limit  (cost=4371.05..4372.85 rows=15 width=16) (actual time=308.732..308.911 rows=15 loops=1)
  ->  Unique  (cost=4371.05..4389.59 rows=154 width=16) (actual time=308.724..308.843 rows=15 loops=1)
        ->  Sort  (cost=4371.05..4380.32 rows=3709 width=16) (actual time=308.715..308.757 rows=20 loops=1)
              Sort Key: p.id, (number_percentage(pv.price, pd.size))
              Sort Method:  quicksort  Memory: 467kB
              ->  Hash Left Join  (cost=98.12..4151.16 rows=3709 width=16) (actual time=7.473..287.571 rows=4817 loops=1)
                    Hash Cond: (pv.product_discount_id = pd.id)
                    ->  Hash Left Join  (cost=83.62..288.57 rows=3709 width=16) (actual time=7.363..69.976 rows=4817 loops=1)
                          Hash Cond: (pv.tax_id = t.id)
                          ->  Hash Join  (cost=25.47..174.79 rows=3709 width=16) (actual time=7.333..47.134 rows=4817 loops=1)
                                Hash Cond: (pv.product_id = p.id)
                                ->  Seq Scan on product_variant pv  (cost=0.00..94.17 rows=4817 width=16) (actual time=0.019..10.970 rows=4817 loops=1)
                                ->  Hash  (cost=23.54..23.54 rows=154 width=4) (actual time=7.288..7.288 rows=1501 loops=1)
                                      ->  Seq Scan on product p  (cost=0.00..23.54 rows=154 width=4) (actual time=0.013..3.591 rows=1501 loops=1)
                          ->  Hash  (cost=31.40..31.40 rows=2140 width=8) (actual time=0.013..0.013 rows=1 loops=1)
                                ->  Seq Scan on tax t  (cost=0.00..31.40 rows=2140 width=8) (actual time=0.005..0.006 rows=1 loops=1)
                    ->  Hash  (cost=12.00..12.00 rows=200 width=8) (actual time=0.006..0.006 rows=0 loops=1)
                          ->  Seq Scan on product_discount pd  (cost=0.00..12.00 rows=200 width=8) (actual time=0.002..0.002 rows=0 loops=1)
Total runtime: 309.404 ms

EXPLAIN ANALYZE (without ORDER BY price_discount):

Limit  (cost=4371.05..4372.85 rows=15 width=16) (actual time=285.012..285.187 rows=15 loops=1)
  ->  Unique  (cost=4371.05..4389.59 rows=154 width=16) (actual time=285.004..285.122 rows=15 loops=1)
        ->  Sort  (cost=4371.05..4380.32 rows=3709 width=16) (actual time=284.995..285.036 rows=20 loops=1)
              Sort Key: p.id
              Sort Method:  quicksort  Memory: 467kB
              ->  Hash Left Join  (cost=98.12..4151.16 rows=3709 width=16) (actual time=6.553..270.930 rows=4817 loops=1)
                    Hash Cond: (pv.product_discount_id = pd.id)
                    ->  Hash Left Join  (cost=83.62..288.57 rows=3709 width=16) (actual time=5.720..64.176 rows=4817 loops=1)
                          Hash Cond: (pv.tax_id = t.id)
                          ->  Hash Join  (cost=25.47..174.79 rows=3709 width=16) (actual time=5.693..42.642 rows=4817 loops=1)
                                Hash Cond: (pv.product_id = p.id)
                                ->  Seq Scan on product_variant pv  (cost=0.00..94.17 rows=4817 width=16) (actual time=0.019..10.173 rows=4817 loops=1)
                                ->  Hash  (cost=23.54..23.54 rows=154 width=4) (actual time=5.651..5.651 rows=1501 loops=1)
                                      ->  Seq Scan on product p  (cost=0.00..23.54 rows=154 width=4) (actual time=0.013..2.810 rows=1501 loops=1)
                          ->  Hash  (cost=31.40..31.40 rows=2140 width=8) (actual time=0.012..0.012 rows=1 loops=1)
                                ->  Seq Scan on tax t  (cost=0.00..31.40 rows=2140 width=8) (actual time=0.005..0.005 rows=1 loops=1)
                    ->  Hash  (cost=12.00..12.00 rows=200 width=8) (actual time=0.006..0.006 rows=0 loops=1)
                          ->  Seq Scan on product_discount pd  (cost=0.00..12.00 rows=200 width=8) (actual time=0.001..0.001 rows=0 loops=1)
Total runtime: 285.719 ms

Why is it so slow? The function is simple so I don't understand this behavior.

A: 

Wild guess: The ORDER BY price_discount is the problem. The database has to process all records before it can do any sort operation and spit out the requested 15 results.

Could you show us the results of EXPLAIN ANALYZE with and without the ORDER BY ?

Frank Heikens
I updated the answer, but ORDER BY is not the problem. It's those 3 number_percentage() functions. If they are removed the query is fast.
Paul
Could also show use the EXPLAIN ANALAYZE (including ANALYZE!) of the query without the function call? And did you index the columns pv.product_discount_id, pd.id, pv.tax_id, t.id, pd.id and pv.product_discount_id ?
Frank Heikens
I updated the answer again (with EXPLAIN ANALYZE).
Paul
A: 

If you run explain query in pg admin 3, you'll be able to see how the query is broken down. Then you can

sometimes adding an index will solve your problem, other-times try rewriting the query. You could try using a CTE, but as your function appears to be a problem reading this may help.

Mr Shoubs
A: 

Change the function from STABLE to IMMUTABLE. My tests on 8.4.4 dropped the explain from about 0.14ms to 0.04 ms per call. Since it never hits the database and is really a true function (i.e. for each input value x, there is one and only one possible f(x)), it really should be IMMUTABLE. This should allow the planner to evaluate it only once for each pair of (n,p).

Also, there's a bug in your code. This line:

RETURN n * (1 + p / 100);

should be this:

RETURN n * (1 + p / 100.0);

As written, since p is an integer, p / 100 will be evaulated as integer division, giving an incorrect result (at least from my assumption of what this function is supposed to be doing).

Matthew Wood
Looking at it again, you really can't remove the ORDER BY price_discount since you are using DISTINCT ON (p.id). If you remove that, "first" record per p.id is undefined and you won't necessarily always get back the same result on different runs of the query. You must have a secondary ORDER BY like that in order to ensure you get the same record back each time.
Matthew Wood
Thank you! You fixed me a bug, but performance is still very bad.
Paul
Can you update your question with an explain analyze with both the bug fix and the IMMUTABLE change in place?
Matthew Wood