views:

28

answers:

2

I have a table with a billion rows and I would like to determine the average time and standard deviation of time for several queries of the form:

select * from mytable where col1 = '36e2ae77-43fa-4efa-aece-cd7b8b669043';
select * from mytable where col1 = '4b58c002-bea4-42c9-8f31-06a499cabc51';
select * from mytable where col1 = 'b97242ae-9f6c-4f36-ad12-baee9afae194';

....

I have a thousand random values for col1 stored in another table.

Is there some way to store how long each of these queries took (in milliseconds) in a separate table, so that I can run some statistics on them? Something like: for each col1 in my random table, execute the query, record the time, then store it in another table.

A completely different approach would be fine, as long as I can stay within PostgreSQL (i.e., I don't want to write an external program to do this).

+1  A: 

Are you aware of the EXPLAIN statement?

This command displays the execution plan that the PostgreSQL planner generates for the supplied statement. The execution plan shows how the table(s) referenced by the statement will be scanned — by plain sequential scan, index scan, etc. — and if multiple tables are referenced, what join algorithms will be used to bring together the required rows from each input table.

The most critical part of the display is the estimated statement execution cost, which is the planner's guess at how long it will take to run the statement (measured in units of disk page fetches). Actually two numbers are shown: the start-up time before the first row can be returned, and the total time to return all the rows. For most queries the total time is what matters, but in contexts such as a subquery in EXISTS, the planner will choose the smallest start-up time instead of the smallest total time (since the executor will stop after getting one row, anyway). Also, if you limit the number of rows to return with a LIMIT clause, the planner makes an appropriate interpolation between the endpoint costs to estimate which plan is really the cheapest.

The ANALYZE option causes the statement to be actually executed, not only planned. The total elapsed time expended within each plan node (in milliseconds) and total number of rows it actually returned are added to the display. This is useful for seeing whether the planner's estimates are close to reality.

Could pretty easily write a script which does an EXPLAIN ANALYZE on your query for each of the random values in a table, and save the output to a file / table / etc.

matt b
Is there some way to just output the time, such that I don't have to parse a file? This is what I will do if I have to, but it just seems like there should be a more straightforward way.
orangeoctopus
`psql -c "EXPLAIN ANALYZE select * from mytable where col1 ..." | grep "Total runtime"`
matt b
I'm really looking for a way to do this completely in SQL, if possible. Seems like I should be able to store the runtime that it is returning in the psql interactive shell directly as a value.Your answer is quite correct and is what I have been planning on doing if nobody can give me a pure-SQL answer.Thanks for your time!
orangeoctopus
you likely could write a PL/pgSQL that could do something like that, but I'm not really sure to be honest. Pablo Santa Cruz's answer is a good one as well; you can configure your server to log whenever a statement takes longer than a configurable amount of time to execute. the log will contain the statement itself.
matt b
+3  A: 

You need to change your PostgreSQL configuration file.

Do enable this property:

log_min_duration_statement = -1        # -1 is disabled, 0 logs all statements                                    
                                       # and their durations, > 0 logs only                                       
                                       # statements running at least this number                                  
                                       # of milliseconds             

After that, execution time will be logged and you will be able to figure out exactly how bad (or good) are performing your queries.

You can also use some LOG PARSING utilities to provide awesome HTML output for further analysis such as pgfouine.

Pablo Santa Cruz