views:

411

answers:

6

Say I want to make an "Optimized query generator". Basically a SQL query optimizer that is a lot better than what can be put in an SQL server based on time/space limitations. It would take a query and DB stats as input and generate an SQL query tailored for the target system that will quickly optimize to a nearly ideal plan.

How much of SQL would need to be supported? Is there a subset of SQL that is flexible enough to easily describe most useful queries but enough smaller than full SQL to make it worth trimming it down to? Also is there a better way to describe queries if you don't need to stick "close to the machine"?

I'm not thinking of a program that you would process existing SQL through but rather a tool for creating new SQL from. It wouldn't actual need to take SQL as input as long as the input language is able to describe the requirements of the query.

I guess another form of the question would be: are their any parts of SQL that are only there for performance and never improve readability/understandability?


As pointed out by someone doing this would require "tons of product-specific knowledge" and that (e.g. nested sub queries vs whatever, what kind of indexes should be used, that sort of thing) is exactly what the tool would be intended to encapsulate so that the user doesn't need to learn that knowledge.


note: I am not interested in generating actual query plans as that is the DBMS's job and can't be done from SQL anyway. I'm interested in a system that can automate the job of making good SQL for a given DBMS from an input that doesn't need to be tuned for that DBMS.

A: 

You might find the patterns in "SQL Queries for Mere Mortals" useful as they work through a structured canonical format starting with English descriptions.

Online at Safari, if you want to take a quick peek.

Andy Dent
+2  A: 

Bramha, I'm not sure if you know what you are asking. SQL Optimization isn't simply a matter of making sure that query components are in the right order. You seem to recognize that you'll need to have intimate knowledge of the indices, data page layouts, etc. etc. but you'd still be left with just reording query clauses unless you gain the appropriate "hooks" into the SQL Server query processor. Because that is what MS does - it essentially "compiles" queries down into a deeper, more fundamental level to optimize the data access.

Mark Brittingham
The intent is to allow the user to describe the query with fewer constrains on how they express it without sacrificing speed. The output would be a query as an SQL text string that is tuned to be feed in like a normal query.
BCS
+2  A: 

I'm surprised to hear you describe SQL as "close to the machine". SQL itself is declarative rather than procedural, and one of the interesting aspects of relational databases is the freedom implementers have to innovate, since SQL itself dictates very little about how the queries should be executed.

I think for sheer utility, it would be very difficult to improve on SQL. I'm not saying it's the perfect language, but it is the lingua franca of relational (and even some non-relational) databases.

Ned Batchelder
I put that in quotes for a reason. In this case the machine is the query engine not the CPU. things like subqueries and what not are not at the highest level possible.
BCS
I didn't think you meant CPU, but SQL is still very high level. I agree with another answerer: optimizing SQL at the SQL level will require tons of product-specific knowledge.
Ned Batchelder
It's a occupational hazard; no matter what the system, I want to take it to a higher level of abstraction. In this case, by shedding needing to known how your/my particular DBMS works.
BCS
@[BCS]: SQL *is* the abstraction that (mostly) shields you from needing to know how a particular DBMS works. Most DMBSes have smart query planners; to do better, you need to either change the query algorithm (which can't safely be done automatically) or restructure the data (also not automatable).
Jeff Shannon
@Jeff S: If SQL is *fully abstract* from the DBMS than *any* query will *always* be executed ideally on *all* DBMS. There would be no need for query tuning. This is not true. I'm thinking of a system that would get closer.
BCS
+1  A: 

umm...there are (I think, too lazy to google it) nine relational operators (scan, jump, hash-merge, etc.) that are used to construct the execution plan of a SQL query. The choice of operators is based on the usage statistics of the target database tables, available indices, et al.

It sounds like you're trying to recreate what the query planner already does...?

EDIT:

  1. I don't think that most queries have that many options in how they can be executed, and
  2. I don't think there is anything you could do to the SQL to force the DB engine to create an execution plan "your way" even if you did fine a more optimal solution.
  3. unless you are planning on creating your own database engine!

I am very confused by this question; it looks like reinventing the wheel but with no wagon to mount it on!?

Steven A. Lowe
@[BCS]: see edits
Steven A. Lowe
The thought is generate SQL that /will/ be optimized well, not force a given plan.
BCS
@[BCS]: One of us is very confused about how execution plans are constructed ;-) I'm fairly certain that minor deviations in how the sql is specified is not really a factor...
Steven A. Lowe
MySQL is known to do badly with sub queries. Some queries that use them can be expressed without them. I'm thinking of a system that would do the needed transformations to remove them where it would help and add them where it would help.
BCS
A: 

Is your intent to write this for a single specific database engine? If not, I suspect that you'll have a rather difficult time of this. Optimization of database queries relies heavily on the exact specifics of the engine's implementation and internals, as well as the tables, indexes, primary/foreign key relations, type and distribution of data, etc, etc. The actual logic of creating an optimized query would likely have very little overlap between different database engines. (For that matter, at least for MySQL the table type would make a huge difference on optimizations.) Each release of each supported DB engine may have significantly different characteristics, as well -- keep in mind that if you're generating SQL, then you need to be able to predict how the engine's own optimizer/query planner will handle the SQL you've generated.

The thing is, query optimization relies only weakly on relational theory, and very heavily on detailed knowledge of the DB's guts and the data being held. Even if you're able to extract the DB's metadata, I suspect that you'll have a difficult time producing a better query plan than the DB itself would -- and if you're not getting the DB's metadata, then your cause is hopeless.

Jeff Shannon
You argue well for my point. What I'm thinking of would be a tool that someone who known relational theory reasonably well can use without having to known the details of a given engine.
BCS
Re "what engine": Once the abstraction for one engine is done the framework would be in place for others as well. One the concrete parts for a given engine are done, that Just flip a switch and you get good output for any engine.
BCS
And yes, I know I'm being very optimistic.
BCS
Jeff Shannon
[...] diminishing marginal returns -- the DBMS already gets the low hanging fruit, and the mid-height fruit, and much of the higher fruit, leaving fairly little left for your optimizer to catch. But hey, it's your time, after all. :)
Jeff Shannon
A: 

Good luck - you've chosen to compete with such companies as Microsoft and Oracle, who live or die by how well their query optimizers do exactly what you propose. The first and primary way to compare one database product with another is with benchmark testing, where the same query workload is applied to each of them, timing measurements are taken, and the winner in most cases is determined by speed of execution.

The world will be impressed if you can do significantly better than the publisher on any of these benchmarks, using their products. At least you'll have a solid career opportunity with whichever one(s) you use.

le dorfier
1st, if you don't shoot high, you hit even lower. 2nd what I'm thinking is something that could take a query tuned for MS-SQL (or not tuned at all) and turn it into something that works well on MySQL. As I understand it, a big cost when switching DBMS is that all your old SQL is now crap.
BCS