views:

572

answers:

5

Have you encountered deficiencies, limitations or flaws while using SQL?

I can think of several examples: Complicated queries/updates, SQL injection, etc. € Could you accomplish the same task easily with a different DML or programming language?

Can you provide me with case examples of problems you have encountered or cases where for example an SQL query required complex constructs vs. simple constructs in something else?

Can you suggest improvements to make SQL more powerful and less complicated? e.g. PSM, etc.

Which RDBMS/SQL product implementation do you feel is the most robust?

What capabilities from a non-SQL environment would you like to see implemented in SQL?..

Let's suppose SQL didn't exist, what would you use for manipulating data?..

+11  A: 

SQL in general has some serious deficiencies as a database language. Just a few of the problems are:

  • Duplicate rows (multi-set rather than set-based model)

  • Nulls and three value logic add complexity, ambiguity and inconsistent results without adding any expressive power to the language

  • SELECT statement syntax is much more verbose and complex than the relational algebra

  • Lack of support for multiple assignment means that referential integrity support and support for constraints in general is severely limited

Two examples of more specific query problems that are hard in SQL:

  • No simple equivalent for transitive closure in SQL. Selection from adjacency relation structures therefore requires the use of either procedural code, cursors or the non-intuitive and hard to optimise recursive query syntax.

  • Lack of key inheritence means that SQL query interfaces invariably return simple two dimensional tables, which is a major drawback for decision support (OLAP) type of queries that are intrinsically n-dimensional.

Improvements? I don't believe there are any useful improvements that would make sense because fixing the above would change the language so radically that there wouldn't be much point in even pretending it was SQL any more. The best way forward I believe is to develop entirely new, truly relational languages. Date and Darwen's D model being the most obvious successor to SQL has already lead to a number of new implementations of relational languages.

dportas
I don't see what SQL has to do with duplicate rows. SQL is a query language, while duplicate rows are a DB engine issue. And NULL is a very powerful tool that prevents everybody from having to come up with sentinel values for each different column. Remember all the problems caused by 9/9/99 being the standard "unknown birthday"?
Gabe
@Gabe, duplicates absolutely are SQL's fault. Support for duplicates is required as part of the SQL standard. In a true relational language we would never have to deal with duplicates in tables and queries because relations always have keys and relational projections always have distinct tuples.
dportas
@Gabe: In practice Null doesn't solve any particular problem but it does create plenty of problems. Firstly the semantics of SQL's null do not make sense given any reasonable and consistent interpretation (e.g. x = x is no longer true and therefore null does not mean the same thing as unknown in any real or mathematical sense). Secondly, null as a "placeholder" for a value that isn't present is a redundant feature - if a value is not present then the same information can more accurately be represented by a tuple without that attribute.
dportas
As far as I can tell, the only part of the SQL syntax that requires duplicates is the `UNION ALL` operator. Remove that and you can always have sets. As for NULLs, I don't understand the difference between a tuple having NULL for a particular attribute and missing that attribute. Maybe you can give an example of how a system would work without NULLs.
Gabe
SELECT without the DISTINCT keyword; Any INNER or OUTER JOIN on something other than key columns; CROSS JOIN; GROUP BY on columns that are not included in the result; CREATE TABLE without uniqueness constraints. These are just a few examples of SQL constructs that can cause duplicates and are required to do so by the ISO SQL standard. There are certainly many more examples.
dportas
You can easily implement SQL in a way that duplicates are never returned, and the language would be exactly the same. However, I'm not sure why you would actually want to, because in real life data sometimes actually is duplicated.
Gabe
A lot of people argue that NULL causes problems, but rarely do they address a compelling use: optional foreign keys. If NULL is disallowed, you have to have a placeholder row in the referent table for values which would otherwise be NULL.
Adam Crume
@Adam: You do not need a "placeholder row". Just decompose the foreign key attribute(s) to another table and then represent the same information by not having a row in that new table. This is nothing more than proper application of normalization principles. Also this represents the missing data more accurately than using a null, nulls being just a very inaccurate approximation of the situation. Finally this avoids some potentially serious problems with nullable foreign keys as implemented in SQL. Nullable foreign keys are pretty useless really. I would strongly recommend avoiding them always.
dportas
@David: I strongly disagree that nullable foreign keys are useless. I see that breaking the table into pieces can work, but I would argue that adds as much complexity as NULL does. I've seen many reasonable tables with half a dozen nullable foreign keys, and creating new tables for each of those would make a big mess.
Adam Crume
@Adam: Those cases with lots of nulls are often blatant examples of "bundling" different types of fact into one table. For example a Customer table that contains both customer names and optional (nullable) company details. This is a bad idea because it means the company details could be partially incomplete. If you put the company attributes into just ONE extra table (not one per nullable column) then that may be enough to make ALL those columns non-nullable. The new model would then be more effective because it ensures that the mandatory columns for a company really are mandatory.
dportas
@David: Not necessarily. A quick example would be a Customer table with an address_id column referencing an Address table which would be null if the address was unknown. I can't remember what else Customer would reference besides Address, but there were more. This comes from an old employer, so I can't give a more specific example.
Adam Crume
@Adam: Well that seems to me like an obvious example of bad design. If the business rule is that an address is not required for every customer then logically the address belongs in a different table. It also seems at least possible that there might be a requirement to have more than one address per customer, which would be another reason why address should have its own table.
dportas
Nulls are necessary in certain circumstances to describe the absence of a particular value like date of death, etc.. nulls for numeric arithmetic values are a different story. I set them to zero because aggregate calc results could result in null if just one value is null.. duplicate rows are also necessary. One to many relationships have them, I dont understand the opinion about duplicate rows being an issue. SQL statements needs more flexibilty in how they can be structured and have robust capability in overriding the way it chooses to process the statement with all sorts of directives.
Frank Computer
@Frank: "One to many relationships" do not require duplicate rows as long as both tables have candidate keys. I hope you agree that every table should have a key. As already discussed, Nulls are never necessary to indicate the absence of a value. If some software doesn't perform well unless you add nulls into the picture then certainly that's the fault of the software - it doesn't imply any general need for nulls.
dportas
@David- I tend to agree with what is mentioned in wiki NULL(SQL), "...a special marker used in SQL to indicate that a data value does not exist in the database..." see: http://en.wikipedia.org/wiki/Null_(SQL).
Frank Computer
Full SQL isn't truly relational and most vendor's SQL implementations are not even Full SQL! ;)
onedaywhen
When you start to denormalize columns into separate tables because its values could be NULL or over-denormalize then your SQL statements become more complicated, degrading performance. Sometimes denormalization is needed, see my post: http://stackoverflow.com/questions/3057990/transfusion-should-i-denormalize-loan-purchase-inventory-and-sale-tables-into
Frank Computer
+4  A: 

From the viewpoint of a data manipulation language, SQL has a few weaknesses:

  1. There is no standard SQL way to use regular expressions. Data constraints are often expressed in the form of regex's and having a standard regex language in SQL would be very useful. I know that SQLite enables you to put in your own regex engine, and I'm sure that other databases do it too, but a standard one would be nice.
  2. SQL needs an imperative programming language within SQL, with language constructs such as variables and variable assignment, lexical closures, and iterators.
  3. SQL needs a standard way to express user-defined functions that can be used in queries.

This is what I think would help it as a relational data manipulation language. For other types of data, SQL is not good enough.

Jay Godse
@Jay: SQL already has an imperative programming language that includes support for iteration, variables and user-defined functions. It's called SQL/PSM and it's used by several DBMSs (with varying degrees of completeness) including DB2 and PostgreSQL
dportas
@David - This language is not implemented, as far as I can tell, in the most popular databases such as Microsoft SQL Server, MySQL, SQLite, Oracle and the like. It would be nice if SQL/PSM were the standard way to write imperative programs in the database.
Jay Godse
There is a "standard regex language in SQL". It is SQL-92's `SIMILAR TO` predicate, which is based on POSIX.
onedaywhen
"[SQL/PSM] is not implemented, as far as I can tell, in the most popular databases" -- that's an issue for vendors and buyers.
onedaywhen
@onedaywhen - The word "standard" is the problem here. Some people think a standard is a common set of behaviours and interactions for vendors to aspire to. Others think that a standard is a codification of common behaviours and interactions that vendors actually implement today. I am from the latter group. Arguing about behaviour that most vendors don't implement is not really helpful for somebody who wishes to use a real SQL database to deliver their software.
Jay Godse
I'm guessing that someone who ponders, "let's suppose that SQL didn't exist..." isn't going to be too averse to fantasy feature lists ;)
onedaywhen
Standard support for regex would be very useful. As much as I hate regex, it would be nice to have when needed.
Michael Blackburn
+1  A: 

Personally, I have a soft spot for CODASYL databases, where you walk datasets to access related data elements... perhaps because my first experience with databases was DECs DBMS. The ability to locate a parent record, then simply walk a set of children was simple to use; and the explicit relationships between datasets built into the underlying database structure is something that is sadly lacking in relational databases... and before anybody suggests it, foreign keys are a pale, one-dimensional shadow of that relationship.

The complexities of inner/outer/straight joins in SQL become an irrelevance, because the structure of the dataset maintains the relationships between different record sets (tables), allowing you to access only those dataset records that are related. Lookup an order and Walking the OrderLine dataset would return only the Order Lines that were part of that order, with no need to formulate a second query to retrieve that series of rows from the entire table of order lines for all orders.

The drawback was that almost all the design needed to be done up front... once the database structure was designed it was pretty much fixed, hard to change.... compared with RDBMS where it's much easier to add new tables at any time.... not particularly suited to a world of RAD/Agile/XP/Scrum.

Now if only somebody could come up with a database system where the structure was as flexible and easy to change as an RDBMS, with all the simplicity of data access of a CODASYL database.

OLAP databases are another great alternative to RDBMS, where the data can readily be structured as an n-dimensional cube, where a query allows you to "slice and dice" the cube, quickly extracting segments of the data. My first introduction to OLAP was Oracle Express, which (unfortunately) uses its own proprietary query language rather than the de facto standard MDX. However, not all data can readily by fitted to a "cube" structure, so it's only suitable for certain application types - although almost any data mining applications are well suited.

Mark Baker
@Mark- So would features found in a multi-valued 'hash-file' db like PICK with P/Basic or English DML be a good choice, or another DML which you may be more familiar with?
Frank Computer
Never used Pick seriously - perhaps because it was (IMO) too tightly connected to the OS - which meant it was very limited in its market... at least when it was first available. I know it has been successfully moved to other OS since, but by then Relational was taking off big time
Mark Baker
Can anyone tell me about other SQL and non-SQL based DBMS' they have used and explain why they feel that DBMS is better than others?
Frank Computer
+3  A: 

To add to the excellent answers by @dportas, for me of the great 'missed opportunities' in SQL is that the proposed temporal extensions, known as TSQL2, never made it into the SQL Standard. Consequently, temporal databases are extremely hard to get right even using Full SQL-92 e.g. multiple assignment would be a boon when, say, a sequenced deletion in a valid-state table requires four statements (an INSERTs, two UPDATEs, and a DELETE). When you consider that most vendors lack the support for SQL-92 features (e.g. SQL Server lack DEFERRABLE constraints, Oracle lacks aubqueries in CHECK constraints, etc) it is near impossible to achieve without the 'worst' kind of procedural code.

onedaywhen
+3  A: 

Fantastic answers, dportas :-)

While not as glamorous, something I wish for in the SQL standard, is the "INSERT OR UPDATE" constraint conflict resolution algorithm. MySQL and SQLite implemented this as extensions of the SQL standard.

This IMO should be intrinsic DB engine functionality, otherwise you tangle with things like:

IF (record_exists) THEN
    -- update record
ELSE
    -- insert new record

Which seems trivial, but it

  • promotes bad habits like copy/pasting code block, which
  • introduces bugs and
  • just wastes your time, many times over.
  • It also adds negative points to maintainability.

Two queries can be easily replaced by one:

"Insert OR Update Table (field = value) Where ID = @ID"

Feels so much more natural :-)

Wez
@KeyboardMonkey: Insert if new, update if exists is something which has been around in 3GL's for a long time. I was surprised when it wasn't included in ANSI-SQL. It would also be nice if I could dynamically construct table names within an SQL proc and reference it in something like SELECT * from $TABLENAME.
Frank Computer