views:

667

answers:

22

In the spirit of

What are the common mistakes we should avoid when programming in SQL?

+14  A: 

Failing to properly deal with untrusted user input, leading to SQL injection attacks.

aem
Amen to that. One should always parameterized queries, rather than manually concatenating SQL code and values.
Søren Løvborg
I haven't been using parametized queries until relatively recently (hibernate/jdbc practically force you too in how easy it was). ASP.net wasn't so - it was too forgiving in my use of Connection.Execute().
Chris Kaminski
+7  A: 

Avoid cursors, try to do everything in plain SQL.

Otávio Décio
+6  A: 
  1. Not using stored procedures.

  2. Not using parameterized queries.

  3. Not putting statements in transactions when required.

  4. Normalizing to an extreme, creating joins that significantly affect performance.

thedz
5. Not normalizing enough, creating data maintenance problems because different tables or rows contradict each other.
Jonathan Leffler
+8  A: 

Not having indexes on columns frequently referenced in WHERE or ORDER BY clauses.

Søren Løvborg
even worse not on join conditions.
David Raznick
I've discovered this recently. I'm no SQL dev... It gave me a great deal of performance boost.
Brian Kim
+2  A: 

Using stored procedures that goes beyond basic CRUD unless they really need them.

Jonas Elfström
Are you saying you should avoid stored proc's or that a common problem is that people do avoid them?
Chris Lively
I'm saying you should avoid them because if you have business logic in both your business layer (code) and in the stored procedures you are bound for a maintenance nightmare. I've seen it happen in many projects.
Jonas Elfström
Using stored procedures for basic CRUD operations is a stale anti-pattern with no modern justification. Just sayin'.
WCWedin
s'procs in combination with denying direct table access: completely immune from sql injections. Just sayin'. Sorry, but that isn't an anti-pattern.
Chris Lively
@ChrisLively: Less exposed, but certainly not immune. Badly written sprocs can still be subject to injection.
Chris Kaminski
+10  A: 

Commmon problems I see include:

Cursors There is almost ALWAYS a better way.

Treating SQL like any other language. It's not. It is set based. Try to think that way.

Using just enough SQL to pull all the data back into a code layer for processing. Do the processing in the SQL server; that's what it is built for. Data paging is a prime example of something relatively easy to do in any SQL engine.

Not taking advantage of the extensions the particular dbms you are using has. I still see dev's ignoring CTE's like the plague.

Complete lack of optimization. Run your app and really watch how many statements get sent to the server. Are you pulling the right data? Are you pulling it just once or pulling it 20 times per page load..

Chris Lively
I'm not familiar with the CTE TLA?
Chris Kaminski
Paging example: http://www.wwwcoder.com/parentid/191/tabid/68/type/art/site/2124/default.aspx
Chris Kaminski
@darthcoder: CTE = Common Table Expression. SQL 2005/2008 specific.
Chris Lively
+1  A: 
  • use parameterised queries/stored procedures - don't use dynamic, adhoc SQL generated by the front-end! Protect yourself against SQL injection and ensure you get the best from execution plan caching and reuse by parameterising
  • don't think in cursors and loops - think set-based operations first, only consider cursors as a last resort
  • use SQL for what it is good for - querying. Complex business logic should be done outside in a language which is more efficient (e.g. string manipulation/formatting in .NET instead of in the db)
  • performance test your queries. Test with data volumes ABOVE what you currently have, see how it is likely to perform as your db grows.
  • ensure you FAIRLY AND CONSISTENTLY compare performance of queries when tuning by clearing down the data cache and execution plan cache, otherwise you may get unfairly skewed results, leading to you thinking you have optimised a query when in fact you may not have done.
AdaTheDev
+7  A: 

Assuming an order when limiting a query to n rows without an ORDER BY clause.

Joel Coehoorn
+4  A: 

Comma-delimited columns. 'nuff said.

Joel Coehoorn
+5  A: 

There should be many common mistakes but I strongly believe that these 2 are the biggest:

  • failure to realize that SQL is a declarative language rather than procedural;
  • failure to manipulate data sets rather than data variables.

And here is a runner-up mistake:

  • failure to treat null values properly.
grigory
+1 for first reference to null values. Often a necessary evil, but an evil they are.
RJFalconer
Nulls will kill any beginner. People need to check for them anytime a column allows it or there is not default value.
Middletone
+1  A: 

Question on the question. Take the prototypical banking example:

Tables: 
  account 
     id
     balance
  transaction
     id
     src_account 
     dest_account
     value

If you wanted to maintain referential integrity, are you better served in your application layer doing

begin trans
insert into transaction .... 
update account ... src ....
update account ... dest ....
commit

or relying on a trigger on the transaction table that updates src and dest themselves? I've heard it argued both ways, and I've never really conducted a performance test that measures the impact of one or the other with a significant number of inserts to "transaction", but I like the conceptual simplicity that whenever I need to make a transaction, it handles ALL aspects of credit, debit and perhaps even balance checking?

Are triggers expressive enough? Does SQL allow me to send strings or other messages back in a rollback() event?

Chris Kaminski
A: 

Biggest mistake : not using a library that make parameterized queries easy. If they are easy, you will use them, and you will not get SQL injections. If they are annoying (like if you need to prepare then execute), you will go to the dark side.

Example of use in PHP :

$q = db_fetch_one( "SELECT * FROM questions WHERE question_id = %s", $_GET['qid'] );

peufeu
A: 

I'd like to say "Not solving problems in a database-appropriate way". Too many people see the solution in terms of iterative languages such as VB or C#, when they need to be thinking differently.

They need to be thinking differently to:

  • Consider how the system can simplify the execution of the query
  • Avoid cursors
  • Give thought to the best indexes that the query could use
  • Use searchable arguments (ie, don't apply functions to a column - apply it to the constant instead)
  • Avoid SQL Injection (letting SQL handle data as data, and commands as commands)

...and more of course. If you can adjust your thinking when you write a query, it'll almost certainly be better.

Then you get to the point where you need to adjust your thinking to write good C#, and you wish you could program everything in LISP and Prolog. ;)

Rob

Rob Farley
A: 

The most common mistake I see with SQL rookies is "SELECT * FROM TABLE." It may seem innocent enough, but very quickly, that asterisk wildcard will become a thorn in your side. In addition to being difficult to optimize, making an implicit assumption about the column order of your sql result usually leads to a long debugging session.

The other issue I see a lot is a lack of attention to primary keys. On one extreme- no primary key is created at all leading to lots of duplicate, orphaned records and a broken data model. On the other extreme, inattention to natural primary keys can prove wasteful and ultimately harm performance or needlessly complicate a schema.

Anyway, these two are definitely in the "SQL for Rookies" category, but happen all too often.

stran
+1  A: 

Avoid treating SQL databases as simple repositories, ignoring the purpose and benefits of SQL - both as a language and a data management system.

If you are using a SQL database in your project, then leverage it's advantages. Use the data integrity, manipulation, and summarization benefits it provides.

Think of your database as a multi-application resource, not a single-app data dump. Abstract data rules into procedures, views, constraints and domains, ready to be leveraged by any application needing access to the data being managed.

Think set-based; that is one of the primary tenants on which SQL is based.

Understand that quite often, the data is the business rule. Finding the top sales people, the overdue invoices, and the scheduled events are all functions the SQL engine is in the best position to implement.

JeremyDWill
+2  A: 

Forgetting to specify a join value when selecting from multiple tables:

// bad
SELECT person.name, city.name FROM person, city
// good
SELECT person.name, city.name FROM person, city WHERE person.city=city.id

A lot of people prefer the JOIN TABLE ON syntax to avoid this (but personally I find it more difficult to read.

// explicit inner join
SELECT person.name, city.name FROM person INNER JOIN city ON person.city=city.id
DisgruntledGoat
A: 

This might only be a MS SQL Server 2000 thing, but NULL values do not break foreign key constraints.

Jacob Winn
A: 

Thinking they know more than the DBA.

smackfu
A: 

The Query Execution Plan.

Greatest SQL tool ever, but way too often ignored.

Eric
A: 

Failing to use parameterized queries.

Failing to address deadlocks before they happen and sometimes after.

Assuming that they know what a variable in the DB means with out checking with the DBA.

Again, deadlocks!!!!!!!

oh and failing to provide lock hints. Please for the life of murphy tell it not to lock your rows if it's not necessary!

Failure to test query performance in query analyzer.

Middletone
A: 

Using inappropriate field types.

This is a noob mistake, but for ages I used TEXT fields for everything except the primary key. Look up the list of field types for your SQL implemtation and use the right ones.

  • If it's a fixed-size character field (e.g. postcode, customer ID), use CHAR.
  • Use TINYINT or SMALLINT if the numbers will be small (e.g. age).
  • Use SET/ENUM if you have a small number of possible values (e.g. user's title like Mr, Mrs etc).
DisgruntledGoat
A: 

Assuming there is column order in a table when there isn't.

A good developer should know that there is no meaning in the order of columns in a table. When inserting records, they need to explicitly specify column names like:

INSERT INTO Customers (FirstName, LastName, Age) VALUES ('Andy', 'John', 23)

Let's say if Age column was dropped from Customers table and Telephone column was added. This INSERT statement in the code would have inserted the wrong value if no column name were specified in the statement.

I personally find that it is better to maintain business logic in a stored procedure than in a code. I had to apply a fix for a bug last week and I was able to just alter the stored procedure in the live database to deploy the the patch. If the business logic were implemented in the code, I would have to deploy precompiled code which means some system downtime.

tnafoo