ansaurus

Question

What deficiencies, limitations or flaws do you feel exist with SQL and what would you change or add to it?

Answer 1

+11 A:

SQL in general has some serious deficiencies as a database language. Just a few of the problems are:

Duplicate rows (multi-set rather than set-based model)
Nulls and three value logic add complexity, ambiguity and inconsistent results without adding any expressive power to the language
SELECT statement syntax is much more verbose and complex than the relational algebra
Lack of support for multiple assignment means that referential integrity support and support for constraints in general is severely limited

Two examples of more specific query problems that are hard in SQL:

No simple equivalent for transitive closure in SQL. Selection from adjacency relation structures therefore requires the use of either procedural code, cursors or the non-intuitive and hard to optimise recursive query syntax.
Lack of key inheritence means that SQL query interfaces invariably return simple two dimensional tables, which is a major drawback for decision support (OLAP) type of queries that are intrinsically n-dimensional.

Improvements? I don't believe there are any useful improvements that would make sense because fixing the above would change the language so radically that there wouldn't be much point in even pretending it was SQL any more. The best way forward I believe is to develop entirely new, truly relational languages. Date and Darwen's D model being the most obvious successor to SQL has already lead to a number of new implementations of relational languages.

dportas 2010-06-19 20:27:44

I don't see what SQL has to do with duplicate rows. SQL is a query language, while duplicate rows are a DB engine issue. And NULL is a very powerful tool that prevents everybody from having to come up with sentinel values for each different column. Remember all the problems caused by 9/9/99 being the standard "unknown birthday"?

Gabe 2010-06-19 21:20:35

@Gabe, duplicates absolutely are SQL's fault. Support for duplicates is required as part of the SQL standard. In a true relational language we would never have to deal with duplicates in tables and queries because relations always have keys and relational projections always have distinct tuples.

dportas 2010-06-19 22:24:57

@Gabe: In practice Null doesn't solve any particular problem but it does create plenty of problems. Firstly the semantics of SQL's null do not make sense given any reasonable and consistent interpretation (e.g. x = x is no longer true and therefore null does not mean the same thing as unknown in any real or mathematical sense). Secondly, null as a "placeholder" for a value that isn't present is a redundant feature - if a value is not present then the same information can more accurately be represented by a tuple without that attribute.

dportas 2010-06-19 22:27:40

As far as I can tell, the only part of the SQL syntax that requires duplicates is the `UNION ALL` operator. Remove that and you can always have sets. As for NULLs, I don't understand the difference between a tuple having NULL for a particular attribute and missing that attribute. Maybe you can give an example of how a system would work without NULLs.

Gabe 2010-06-20 01:17:15

SELECT without the DISTINCT keyword; Any INNER or OUTER JOIN on something other than key columns; CROSS JOIN; GROUP BY on columns that are not included in the result; CREATE TABLE without uniqueness constraints. These are just a few examples of SQL constructs that can cause duplicates and are required to do so by the ISO SQL standard. There are certainly many more examples.

dportas 2010-06-20 05:46:57

You can easily implement SQL in a way that duplicates are never returned, and the language would be exactly the same. However, I'm not sure why you would actually want to, because in real life data sometimes actually is duplicated.

Gabe 2010-06-20 12:37:06

A lot of people argue that NULL causes problems, but rarely do they address a compelling use: optional foreign keys. If NULL is disallowed, you have to have a placeholder row in the referent table for values which would otherwise be NULL.

Adam Crume 2010-06-25 03:17:57

@Adam: You do not need a "placeholder row". Just decompose the foreign key attribute(s) to another table and then represent the same information by not having a row in that new table. This is nothing more than proper application of normalization principles. Also this represents the missing data more accurately than using a null, nulls being just a very inaccurate approximation of the situation. Finally this avoids some potentially serious problems with nullable foreign keys as implemented in SQL. Nullable foreign keys are pretty useless really. I would strongly recommend avoiding them always.

dportas 2010-06-25 06:27:08

@David: I strongly disagree that nullable foreign keys are useless. I see that breaking the table into pieces can work, but I would argue that adds as much complexity as NULL does. I've seen many reasonable tables with half a dozen nullable foreign keys, and creating new tables for each of those would make a big mess.

Adam Crume 2010-06-25 21:04:22

@Adam: Those cases with lots of nulls are often blatant examples of "bundling" different types of fact into one table. For example a Customer table that contains both customer names and optional (nullable) company details. This is a bad idea because it means the company details could be partially incomplete. If you put the company attributes into just ONE extra table (not one per nullable column) then that may be enough to make ALL those columns non-nullable. The new model would then be more effective because it ensures that the mandatory columns for a company really are mandatory.

dportas 2010-06-25 21:46:42

@David: Not necessarily. A quick example would be a Customer table with an address_id column referencing an Address table which would be null if the address was unknown. I can't remember what else Customer would reference besides Address, but there were more. This comes from an old employer, so I can't give a more specific example.

Adam Crume 2010-06-25 23:29:38

@Adam: Well that seems to me like an obvious example of bad design. If the business rule is that an address is not required for every customer then logically the address belongs in a different table. It also seems at least possible that there might be a requirement to have more than one address per customer, which would be another reason why address should have its own table.

dportas 2010-06-26 06:55:01

Nulls are necessary in certain circumstances to describe the absence of a particular value like date of death, etc.. nulls for numeric arithmetic values are a different story. I set them to zero because aggregate calc results could result in null if just one value is null.. duplicate rows are also necessary. One to many relationships have them, I dont understand the opinion about duplicate rows being an issue. SQL statements needs more flexibilty in how they can be structured and have robust capability in overriding the way it chooses to process the statement with all sorts of directives.

Frank Computer 2010-06-28 05:04:38

@Frank: "One to many relationships" do not require duplicate rows as long as both tables have candidate keys. I hope you agree that every table should have a key. As already discussed, Nulls are never necessary to indicate the absence of a value. If some software doesn't perform well unless you add nulls into the picture then certainly that's the fault of the software - it doesn't imply any general need for nulls.

dportas 2010-06-28 15:15:22

@David- I tend to agree with what is mentioned in wiki NULL(SQL), "...a special marker used in SQL to indicate that a data value does not exist in the database..." see: http://en.wikipedia.org/wiki/Null_(SQL).

Frank Computer 2010-06-28 20:53:14

Full SQL isn't truly relational and most vendor's SQL implementations are not even Full SQL! ;)

onedaywhen 2010-07-09 09:39:50

When you start to denormalize columns into separate tables because its values could be NULL or over-denormalize then your SQL statements become more complicated, degrading performance. Sometimes denormalization is needed, see my post: http://stackoverflow.com/questions/3057990/transfusion-should-i-denormalize-loan-purchase-inventory-and-sale-tables-into

Frank Computer 2010-08-09 10:52:47

Answer 2

+4 A:

From the viewpoint of a data manipulation language, SQL has a few weaknesses:

There is no standard SQL way to use regular expressions. Data constraints are often expressed in the form of regex's and having a standard regex language in SQL would be very useful. I know that SQLite enables you to put in your own regex engine, and I'm sure that other databases do it too, but a standard one would be nice.
SQL needs an imperative programming language within SQL, with language constructs such as variables and variable assignment, lexical closures, and iterators.
SQL needs a standard way to express user-defined functions that can be used in queries.

This is what I think would help it as a relational data manipulation language. For other types of data, SQL is not good enough.

Jay Godse 2010-06-19 21:12:00

@Jay: SQL already has an imperative programming language that includes support for iteration, variables and user-defined functions. It's called SQL/PSM and it's used by several DBMSs (with varying degrees of completeness) including DB2 and PostgreSQL

dportas 2010-06-19 22:30:57

@David - This language is not implemented, as far as I can tell, in the most popular databases such as Microsoft SQL Server, MySQL, SQLite, Oracle and the like. It would be nice if SQL/PSM were the standard way to write imperative programs in the database.

Jay Godse 2010-06-19 23:02:51

There is a "standard regex language in SQL". It is SQL-92's `SIMILAR TO` predicate, which is based on POSIX.

onedaywhen 2010-07-09 09:20:32

"[SQL/PSM] is not implemented, as far as I can tell, in the most popular databases" -- that's an issue for vendors and buyers.

onedaywhen 2010-07-09 09:22:11

@onedaywhen - The word "standard" is the problem here. Some people think a standard is a common set of behaviours and interactions for vendors to aspire to. Others think that a standard is a codification of common behaviours and interactions that vendors actually implement today. I am from the latter group. Arguing about behaviour that most vendors don't implement is not really helpful for somebody who wishes to use a real SQL database to deliver their software.

Jay Godse 2010-07-09 12:37:09

I'm guessing that someone who ponders, "let's suppose that SQL didn't exist..." isn't going to be too averse to fantasy feature lists ;)

onedaywhen 2010-07-09 12:59:00

Standard support for regex would be very useful. As much as I hate regex, it would be nice to have when needed.

Michael Blackburn 2010-08-06 15:44:35

Answer 3

+1 A:

Personally, I have a soft spot for CODASYL databases, where you walk datasets to access related data elements... perhaps because my first experience with databases was DECs DBMS. The ability to locate a parent record, then simply walk a set of children was simple to use; and the explicit relationships between datasets built into the underlying database structure is something that is sadly lacking in relational databases... and before anybody suggests it, foreign keys are a pale, one-dimensional shadow of that relationship.

The complexities of inner/outer/straight joins in SQL become an irrelevance, because the structure of the dataset maintains the relationships between different record sets (tables), allowing you to access only those dataset records that are related. Lookup an order and Walking the OrderLine dataset would return only the Order Lines that were part of that order, with no need to formulate a second query to retrieve that series of rows from the entire table of order lines for all orders.

The drawback was that almost all the design needed to be done up front... once the database structure was designed it was pretty much fixed, hard to change.... compared with RDBMS where it's much easier to add new tables at any time.... not particularly suited to a world of RAD/Agile/XP/Scrum.

Now if only somebody could come up with a database system where the structure was as flexible and easy to change as an RDBMS, with all the simplicity of data access of a CODASYL database.

OLAP databases are another great alternative to RDBMS, where the data can readily be structured as an n-dimensional cube, where a query allows you to "slice and dice" the cube, quickly extracting segments of the data. My first introduction to OLAP was Oracle Express, which (unfortunately) uses its own proprietary query language rather than the de facto standard MDX. However, not all data can readily by fitted to a "cube" structure, so it's only suitable for certain application types - although almost any data mining applications are well suited.

Mark Baker 2010-06-19 22:53:03

@Mark- So would features found in a multi-valued 'hash-file' db like PICK with P/Basic or English DML be a good choice, or another DML which you may be more familiar with?

Frank Computer 2010-06-20 05:13:26

Never used Pick seriously - perhaps because it was (IMO) too tightly connected to the OS - which meant it was very limited in its market... at least when it was first available. I know it has been successfully moved to other OS since, but by then Relational was taking off big time

Mark Baker 2010-06-20 09:58:06

Can anyone tell me about other SQL and non-SQL based DBMS' they have used and explain why they feel that DBMS is better than others?

Frank Computer 2010-06-28 01:12:37

Answer 4

+3 A:

To add to the excellent answers by @dportas, for me of the great 'missed opportunities' in SQL is that the proposed temporal extensions, known as TSQL2, never made it into the SQL Standard. Consequently, temporal databases are extremely hard to get right even using Full SQL-92 e.g. multiple assignment would be a boon when, say, a sequenced deletion in a valid-state table requires four statements (an INSERTs, two UPDATEs, and a DELETE). When you consider that most vendors lack the support for SQL-92 features (e.g. SQL Server lack DEFERRABLE constraints, Oracle lacks aubqueries in CHECK constraints, etc) it is near impossible to achieve without the 'worst' kind of procedural code.

onedaywhen 2010-07-09 09:34:43

Answer 5

+3 A:

Fantastic answers, dportas :-)

While not as glamorous, something I wish for in the SQL standard, is the "INSERT OR UPDATE" constraint conflict resolution algorithm. MySQL and SQLite implemented this as extensions of the SQL standard.

This IMO should be intrinsic DB engine functionality, otherwise you tangle with things like:

IF (record_exists) THEN
    -- update record
ELSE
    -- insert new record

Which seems trivial, but it

promotes bad habits like copy/pasting code block, which
introduces bugs and
just wastes your time, many times over.
It also adds negative points to maintainability.

Two queries can be easily replaced by one:

"Insert OR Update Table (field = value) Where ID = @ID"

Feels so much more natural :-)

Wez 2010-07-16 07:41:17

@KeyboardMonkey: Insert if new, update if exists is something which has been around in 3GL's for a long time. I was surprised when it wasn't included in ANSI-SQL. It would also be nice if I could dynamically construct table names within an SQL proc and reference it in something like SELECT * from $TABLENAME.

Frank Computer 2010-07-16 21:18:59

ansaurus

tags:

views:

answers:

What deficiencies, limitations or flaws do you feel exist with SQL and what would you change or add to it?

related questions