What are the types and inner workings of a query optimizer?

views:

answers:

What are the types and inner workings of a query optimizer?

As I understand it, most query optimizers are "cost-based". Others are "rule-based", or I believe they call it "Syntax Based". So, what's the best way to optimize the syntax of SQL statements to help an optimizer produce better results?

Some cost-based optimizers can be influenced by "hints" like FIRST_ROWS(). Others are tailored for OLAP. Is it possible to know more detailed logic about how Informix IDS and SE's optimizers decide what's the best route for processing a query, other than SET EXPLAIN? Is there any documentation which illustrates the ranking of SELECT statements as to what's the fastest way to access rows, assuming it's indexed?

I would imagine that "SELECT col FROM table WHERE ROWID = n" is the fastest (rank 1).

If I'm not mistaking, Informix SE's ROWID is a SERIAL(INT) which allows for a max. of 2GB nrows, or maybe it uses INT9 for TB's nrows? SE's optimizer is cost based when it has enough data but it does not use distributions like the IDS optimizer.

IDS'ROWID isn't an INT, it is the logical address of the row's page left shifted 8 bits plus the slot number on the page that contains the row's data.

IDS' optimizer is a cost based optimizer that uses data about the index depth and width, number of rows, number of pages, and the data distributions created by update statistics MEDIUM and HIGH to decide which query path is the least expensive, but there's no ranking of statements?

I think Oracle uses HEX values for ROWID. Too bad ROWID can't be oftenly used, since a rows ROWID can change. So maybe ROWID can be used by the optimizer as a counter to report a query progress?, an idea I mentioned in my "Begin viewing query results before query completes" question? I feel it wouldn't be that difficult to report a query's progress while being processed, perhaps at the expense of some slight overhead, but it would be nice to know ahead of time: A "Google-like" estimate of how many rows meet a query's criteria, display it's progress every 100, 200, 500 or 1,000 rows, give users the ability to cancel it at anytime and start displaying the qualifying rows as they are being put into the current list, while it continues searching?.. This is just one example, perhaps we could think other neat/useful features, the ingridients are more or less there.

Perhaps we could fine-tune each query with more granularity than currently available? OLTP queries tend to be mostly static and pre-defined. The "what-if's" are more OLAP, so let's try to add more control and intelligence to it? So, therefore, being able to more precisely control, not just "hint/influence" the optimizer is what's needed. We can then have more dynamic SELECT statements for specific situations! Maybe even tell IDS to read blocks of index nodes at-a-time instead of one-by-one, etc. etc.

I'm not really sure what your are after but here is some info on SQL Server query optimizer which I've recently read:

13 Things You Should Know About Statistics and the Query Optimizer

SQL Server Query Execution Plan Analysis

and one for Informix that I just found using google:
Part 1: Tuning Informix SQL

KM 2010-04-14 13:32:10

@KM - I clarified and added some more info. Perhaps now you can get a better feel for what I'm looking for.

Frank Computer 2010-04-14 18:16:11

For Oracle, your best resource would be Cost Based oracle Fundamentals. It's about 500 pages (and billed as Volume 1 but there haven't been any followups yet).

For a (very) simple full-table scan, progress can sometimes be monitored through v$session_longops. Oracle knows how many blocks it has to scan, how many blocks it has scanned, how many it has to go, and reports on progress.

Indexes are a different matter. If I search for records for a client 'Frank', and use the index, the database will make a guess at how many 'Frank' entries are in the table, but that guess can be massively off. It may be that you have 1000 'Frankenstein' and just 1 'Frank' or vice versa.

It gets even more complicated as you add in other filter and access predicates (eg where multiple indexes can be chosen), and makes another leap as you include table joins. And thats without getting into the complex stuff about remote databases, domain indexes like Oracle Text and Locator.

In short, it is very complicated. It is stuff that can be useful to know if you are responsible for tuning a large application. Even for basic development you need to have some grounding in how the database can physically retrieve that data you are interested.

But I'd say you are going the wrong way here. The point of an RDBMS is to abstract the details so that, for the most part, they just happen. Oracle employs smart people to write query transformation stuff into the optimizer so us developers can move away from 'syntax fiddling' to get the best plans (not totally, but it is getting better).

Gary 2010-04-15 00:58:42

ansaurus

tags:

views:

answers:

What are the types and inner workings of a query optimizer?

related questions