In the spirit of
What are the common mistakes we should avoid when programming in SQL?
In the spirit of
What are the common mistakes we should avoid when programming in SQL?
Failing to properly deal with untrusted user input, leading to SQL injection attacks.
Not using stored procedures.
Not using parameterized queries.
Not putting statements in transactions when required.
Normalizing to an extreme, creating joins that significantly affect performance.
Not having indexes on columns frequently referenced in WHERE or ORDER BY clauses.
Using stored procedures that goes beyond basic CRUD unless they really need them.
Commmon problems I see include:
Cursors There is almost ALWAYS a better way.
Treating SQL like any other language. It's not. It is set based. Try to think that way.
Using just enough SQL to pull all the data back into a code layer for processing. Do the processing in the SQL server; that's what it is built for. Data paging is a prime example of something relatively easy to do in any SQL engine.
Not taking advantage of the extensions the particular dbms you are using has. I still see dev's ignoring CTE's like the plague.
Complete lack of optimization. Run your app and really watch how many statements get sent to the server. Are you pulling the right data? Are you pulling it just once or pulling it 20 times per page load..
Assuming an order when limiting a query to n rows without an ORDER BY clause.
There should be many common mistakes but I strongly believe that these 2 are the biggest:
And here is a runner-up mistake:
Question on the question. Take the prototypical banking example:
Tables:
account
id
balance
transaction
id
src_account
dest_account
value
If you wanted to maintain referential integrity, are you better served in your application layer doing
begin trans
insert into transaction ....
update account ... src ....
update account ... dest ....
commit
or relying on a trigger on the transaction table that updates src and dest themselves? I've heard it argued both ways, and I've never really conducted a performance test that measures the impact of one or the other with a significant number of inserts to "transaction", but I like the conceptual simplicity that whenever I need to make a transaction, it handles ALL aspects of credit, debit and perhaps even balance checking?
Are triggers expressive enough? Does SQL allow me to send strings or other messages back in a rollback() event?
Biggest mistake : not using a library that make parameterized queries easy. If they are easy, you will use them, and you will not get SQL injections. If they are annoying (like if you need to prepare then execute), you will go to the dark side.
Example of use in PHP :
$q = db_fetch_one( "SELECT * FROM questions WHERE question_id = %s", $_GET['qid'] );
I'd like to say "Not solving problems in a database-appropriate way". Too many people see the solution in terms of iterative languages such as VB or C#, when they need to be thinking differently.
They need to be thinking differently to:
...and more of course. If you can adjust your thinking when you write a query, it'll almost certainly be better.
Then you get to the point where you need to adjust your thinking to write good C#, and you wish you could program everything in LISP and Prolog. ;)
Rob
The most common mistake I see with SQL rookies is "SELECT * FROM TABLE." It may seem innocent enough, but very quickly, that asterisk wildcard will become a thorn in your side. In addition to being difficult to optimize, making an implicit assumption about the column order of your sql result usually leads to a long debugging session.
The other issue I see a lot is a lack of attention to primary keys. On one extreme- no primary key is created at all leading to lots of duplicate, orphaned records and a broken data model. On the other extreme, inattention to natural primary keys can prove wasteful and ultimately harm performance or needlessly complicate a schema.
Anyway, these two are definitely in the "SQL for Rookies" category, but happen all too often.
Avoid treating SQL databases as simple repositories, ignoring the purpose and benefits of SQL - both as a language and a data management system.
If you are using a SQL database in your project, then leverage it's advantages. Use the data integrity, manipulation, and summarization benefits it provides.
Think of your database as a multi-application resource, not a single-app data dump. Abstract data rules into procedures, views, constraints and domains, ready to be leveraged by any application needing access to the data being managed.
Think set-based; that is one of the primary tenants on which SQL is based.
Understand that quite often, the data is the business rule. Finding the top sales people, the overdue invoices, and the scheduled events are all functions the SQL engine is in the best position to implement.
Forgetting to specify a join value when selecting from multiple tables:
// bad
SELECT person.name, city.name FROM person, city
// good
SELECT person.name, city.name FROM person, city WHERE person.city=city.id
A lot of people prefer the JOIN TABLE ON
syntax to avoid this (but personally I find it more difficult to read.
// explicit inner join
SELECT person.name, city.name FROM person INNER JOIN city ON person.city=city.id
This might only be a MS SQL Server 2000 thing, but NULL values do not break foreign key constraints.
The Query Execution Plan.
Greatest SQL tool ever, but way too often ignored.
Failing to use parameterized queries.
Failing to address deadlocks before they happen and sometimes after.
Assuming that they know what a variable in the DB means with out checking with the DBA.
Again, deadlocks!!!!!!!
oh and failing to provide lock hints. Please for the life of murphy tell it not to lock your rows if it's not necessary!
Failure to test query performance in query analyzer.
Using inappropriate field types.
This is a noob mistake, but for ages I used TEXT fields for everything except the primary key. Look up the list of field types for your SQL implemtation and use the right ones.
Assuming there is column order in a table when there isn't.
A good developer should know that there is no meaning in the order of columns in a table. When inserting records, they need to explicitly specify column names like:
INSERT INTO Customers (FirstName, LastName, Age) VALUES ('Andy', 'John', 23)
Let's say if Age column was dropped from Customers table and Telephone column was added. This INSERT statement in the code would have inserted the wrong value if no column name were specified in the statement.
I personally find that it is better to maintain business logic in a stored procedure than in a code. I had to apply a fix for a bug last week and I was able to just alter the stored procedure in the live database to deploy the the patch. If the business logic were implemented in the code, I would have to deploy precompiled code which means some system downtime.