ansaurus

Question

Why do SQL statements take so long when "limited"?

Answer 1

A:

I'm suspicious it's because you don't have an ORDER BY. Without ordering, you might have to cruise a whole lot of records to get 10 that meet your criterion.

Charlie Martin 2009-04-27 02:12:22

I would think not having an ORDER BY would speed things up. If you have ORDER BY, the database needs to return the ten "lowest" rows, which involves sorting or all rows (or clever use of an index on the sort column). Now it just needs to return the first ten (distinct) rows it finds.

Thilo 2009-04-27 02:22:02

This isn't necessarily true. I believe that this is a new feature in postgres 8.2 or 8.3 for example. Other dbms's will probably differ in support for this optimization.

Dana the Sane 2009-04-27 02:27:57

I think the DISTINCT answer's right anyway. That *guarantees* you need to scan lots of rows, where using a random order only means there's a certain probability of needing to scan lots of rows.

Charlie Martin 2009-04-27 03:00:08

Answer 2

+7 A:

You have a DISTINCT. This means that to find 10 distinct rows, it's necessary to scan all rows that match the predicate until 10 different some_fields are found.

Depending on your indices, the query optimizer may decide that scanning all rows is the best way to do this.

10 distinct rows could represent 10, a million, an infinity of non-distinct rows.

tpdi 2009-04-27 02:13:22

Answer 3

+1 A:

Any time there's an operation involved that involves aggregation, and "DISTINCT" certainly qualifies, the optimizer is going to do the aggration before even thinking about what's next. And aggration means scanning the whole table (in your case involving a sort, unless there's an index).

But the most likely deal-breaker is that you are grouping on an operation on a column, rather than a plain column value. The optimizer generally disregards a number of possible operations once you are operating on a column transformation of some kind. It's quite possibly not smart enough to know that the ordering of "LIKE 'text%'" and "= 'text'" is the same for grouping purposes.

And remember, you're doing an aggregation on an operation on a column.

le dorfier 2009-04-27 02:44:37

Answer 4

A:

how big is the table? do you have any indexes on the table? check your query execution plan to determine if it's doing a table scan, an index scan, or an index seek. if it's doing a table scan then you most likely dont have any indexes.

try putting an index on the field your filtering by and/or the field your selecting.

DForck42 2009-04-27 02:45:21

how do I check the query execution plan?

Ash 2009-04-27 02:58:18

Answer 5

+3 A:

Can you post the results of running EXPLAIN on the query? This will reveal what Postgres is doing to execute the query, and is generally the first step in diagnosing query performance problems.

It may be sorting or constructing a hash table of the entire rowset to eliminate the non-distinct records before returning the first row to the LIMIT operator. It makes sense that the engine should be able to read a fraction of the records, returning one new distinct at a time until the LIMIT clause has satisfied its 10 quota, but there may not be an operator implemented to make that work.

Is the some_field unique? If not, it would be useless in locating distinct records. If it is, then the DISTINCT clause would be unnecessary, since that index already guarantees that each row is unique on some_field.

Chris Smith 2009-04-27 02:45:27

ansaurus

tags:

views:

answers:

Why do SQL statements take so long when "limited"?

related questions