views:

57

answers:

3

Hello there,

i have a webapplication which performs searches on a postgresql view with about 33.000 rows. If i try this alone it takes about 1-2 seconds to get an result, which is ok i first thought. But then i downloaded the Web Application Stress Test Tool from Microsoft to give some load on my webapp. So i tried it first with 10 concurrent "users". When the test runs and i perform a search it takes much longer and i have to wait about 10-20 seconds to get my result, which is unacceptable i think. Because i'm new to the hole Database (Postgresql) thing, I read a lot in the past 3-4 days but i can't make the search performe faster. I changed some of the config settings like work_mem, shared_buffer and so on but it doesn't get better.

So my question is: Can anybody give me tips what i can change in my config or on my server to get a better performance which more then ten concurrent users?

Here are some details about Server and view:

Server (virual machine):

3 GHZ Xeon
3 GB Ram
40 GB Harddrive

The Select statement of the view looks something like this:

SELECT my selects, ....   
FROM tab1
JOIN tab2 ON tab1.index1 = tab2.index1
JOIN tab3 ON tab1.index0 = tab3.index0
JOIN tab4 ON tab1.index1 = tab4.index1;

I set an index on each index1 and index0.

The Explain Analyse (with default postgres.conf):

EXPLAIN ANALYZE SELECT * from view_myview;

Nested Loop  (cost=0.90..29042.71 rows=49840 width=1803) (actual time=0.384..5380.477 rows=33620 loops=1)  
 ->  Merge Join  (cost=0.90..11740.81 rows=24403 width=1257) (actual time=0.263..2548.377 rows=22601 loops=1)
     Merge Cond: (tab2.index1 = tab1.index1)
     ->  Merge Join  (cost=0.00..7170.63 rows=15968 width=1251) (actual time=0.157..1225.752 rows=15968 loops=1)
           Merge Cond: (tab2.index1 = tab4.index1)
           ->  Index Scan using tab2_index1_idx on tab2 (cost=0.00..3617.45 rows=15968 width=1025) (actual time=0.053..239.399 rows=15968 loops=1)
           ->  Index Scan using tab4_index1_idx on tab4 (cost=0.00..3310.83 rows=17103 width=226) (actual time=0.045..253.721 rows=17103 loops=1)
     ->  Index Scan using tab1_index1_0_idx on tab4  (cost=0.00..4226.13 rows=24403 width=50) (actual time=0.051..347.333 rows=24403 loops=1)
->  Index Scan using tab3_index0_idx on tab3 (cost=0.00..0.64 rows=2 width=568) (actual time=0.030..0.050 rows=1 loops=22601)
     Index Cond: (tab3.index0 = tab1.index0)
Total runtime: 5814.165 ms

Hope anybody can help,

Nico

A: 

This is a star query, but for some reason PostgreSQL decides to use MERGE JOIN between the dimension tables.

The results in the whole indexes on tab1, tab2 and tab4 being scanned which clutters the cache.

Try increasing your shared_buffer so that all three indexes fit there.

Also, could you please post the results of the following queries?

SELECT  COUNT(*)
FROM    tab2
JOIN    tab4
ON      tab2.index1 = tab4.index1

SELECT  COUNT(*)
FROM    tab2
JOIN    tab4
ON      tab2.index1 = tab4.index1
JOIN    tab1
ON      tab1.index1 = tab4.index1

SELECT  COUNT(*)
FROM    tab1
JOIN    tab3
ON      tab3.index0 = tab1.index0

SELECT  COUNT(*)
FROM    tab1
JOIN    tab4
ON      tab1.index1 = tab4.index1

SELECT  COUNT(*)
FROM    tab1
JOIN    tab2
ON      tab1.index1 = tab2.index1
Quassnoi
As i comment above this query is not the hole propblem. My Problem is that it takes very long when i query with much people on the database. I have no idea how the DB works but i thought that postgres opens 10 processes and evry process querys my View. So i dont have to wait until each process has finished. I told you the results tomorrow. I dont have enough time now :/
Nico
+1  A: 

Are you actually reading the whole view every time, without any filtering at all? If that means you're doing filtering in the application, you really should push those down as WHERE clauses. If you are doing it with WHERE clauses and just not including it in the post here, then you need to repost with that included :-)

And if you are reading the whole thing every time, then yeah, there's not all that much you can do about it. As has been previously commented, increase your shared_buffers so everything fits (it seems to be a small database).

The plan does look a bit weird - exactly which config parameters have you changed, and to what?

Magnus Hagander
I do filtering. If i do it dont take that long, but my propblem is not that one query. My Problem is that it takes very long when i query with much people on the database. I have no idea how the DB works but i thought that postgres opens 10 processes and evry process querys my View. So i dont have to wait until each process has finished. You know what i mean?
Nico
A: 

As you said, the single query that you are showing in your question is not a real problem.

You have to detect real problem before solving it. "it takes very long with 10 connections" is not enough.

Explain analyze that you posted is useless - show the real query with WHERE condition. The times (in explain analyze output) prove only that your server is simply overloaded at the moment. 5 seconds time for a query on 40k rows? - that's really tragic.

You need to detect queries which consume most server resources. To achieve this, do workload statistics with a tool like pgfouine. it takes some time but it's worth it.

I'd also have a look at your system stats (IO usage, memory, CPU) before guessing any more.

If this is going to be a production server, setup a monitoring tool - if you don't have one yet. I'd recommend munin, quite easy to get up and running in 15 minutes (comes packaged for some linux distros).

filiprem