What is the worst SQL query you've ever seen? What made it bad?
The classic xkcd of course:
WHERE name = ROBERT'); DROP TABLE students;--
A customer was storing a comma delimited list of 3 values in a varchar field (classic ASP application) so they had a stored procedure that looked something like this:
SELECT *
FROM
SomeTable
WHERE
Field LIKE @Param + ',%'
OR
Field LIKE '%,' + @Param + ',%'
OR
Field LIKE '%,' + @Param
It should be obvious why it's horrible :)
Recently, I've seen a (greater than) 4000 line TSQL stored procedure that was a chain of IF statments for matching parts of addresses. It could be reduced to less than 50 lines!
I'm saving the code for a future DailyWTF!
SELECT * FROM some_table;
What made it so bad was the code was relying on getting the results in order based on a timestamp. This had apparently worked for a while before I got called in to fix it.
DELETE FROM table
Seen right after I typed and executed it, I had forgotten the WHERE clause. Now I always run a SELECT statement first and change the SELECT to DELETE after I am satisfied that the proper rows will be affected.
It's probably not the worst but I see this far too often (Misuse of the group by clause):
SELECT
C.CustomerID, C.CustomerName, C.CustomerType, C.Address1, C.City,
C.State, SUM(S.Sales) as TotalSales
FROM
Customers C
INNER JOIN Sales S
ON C.CustomerID = S.CustomerID
GROUP BY
C.CustomerID, C.CustomerName, C.CustomerType, C.Address1, C.City, C.State
Instead of:
SELECT
C.CustomerID, C.CustomerName,
C.CustomerType, C.Address1, C.City,
C.State, S.TotalSales
FROM
Customers C
INNER JOIN
(SELECT CustomerID, SUM(Sales) as TotalSales FROM Sales GROUP BY CustomerID) S
ON
C.CustomerID = S.CustomerID
My own, which is far to long to post here -- closing now on 3500 lines
I have to really share the blame with an absolutely horrible schema. What started off as a simple exercise in pivoting denormalized data using some unions turned into an unwieldy nightmare. It's badly in need of repair.
Runner up is this:
select
case datepart(mm,getdate())
when 1 then 'Jan'
when 2 then 'Feb'
when 3 then 'March'
when 4 then 'Apr'
when 5 then 'May'
when 6 then 'Jun'
when 7 then 'July'
when 8 then 'Aug'
when 9 then 'Sept'
when 10 then 'Otc'
when 11 then 'Nov'
when 11 then 'Dec'
end
There are no typos in that post -- that's how it was written. Thank you, consulting dollars!
I of course refactored with Select left(datename(mm, getdate()), 3)
read it out loud.
select [select],*,star as [as] from [from],[order] order by [by]
When I first got my current job my first project was to create an application that summarized our license usage data in our computer labs. He insisted that he didn't want the backend database to be normalized because joins were "too expensive." It being my first week, I wasn't in a position to argue.
Now, in order to extract any useful data from the database, one has to "undo" the denormalization in every query that needs to extract summaries to remove the duplicated data in each row. Of course, these are the only queries that are actually used. You see a lot of nested selects that would be completely unnecessary if the data were normalized, such as:
select location, sum(login_time) as total_login_time
from
(select location, session_id, max(login_time) as login_time
from sessions
where location in ('lab1','lab2')
and session_start >= @start_date
and session_end <= @end_date
group by location, session_id) tbl
group by location
Although, the query itself isn't particularly ugly -- though some are -- the process of having to jump through hoops every time to undo the unnecessary denormalization hurts.
Now the boss is gone, but I don't have time to rewrite it...
DELETE FROM some_table WHERE some_thing IN (SELECT some_column_from_wrong_table FROM correct_table WHERE some_id=something).
The some_column_from_wrong_table has a column that wasn't even in the table, but it was in another table. Problem was the correct_table was named 'Event' and somehow it returned ALL rows instead of NO rows (or more importantly, an error!).
Two lessons learned: NEVER EVER, under any circumstances, name a table after any form of system name. Second thing was run select statements first, then change to delete.
This was SqlServer 2005 by the way. I'm still pissed it didn't throw an error.
In an Access Database, there was a query like the following:
SELECT *
FROM Bad_2 INNER JOIN Bad_1 ON Bad_2.Bad_1_id = Bad_1.ID;
and both tables had a field with the same name. When Access comes across a field name for a second time, it makes up a new name for it. The previous guy used the generated field name in code.
Seen many woeful pieces of SQL in my time. One that comes to mind is of the form
load data from a file, loop over that file, accessing db for each line in file.
Seems ok on test systems with 10 or so lines, 100K-1million = nasty even for primary key lookups.
BTW, the solution is to load the data into the db and think in sets.
-- Choose your favourite lang eg. perl, python ...
load file into data structure (eg array)
for (1 .. n) loop
myid := array[n];
select * from table where id = myid;
if the row exists update table set ... where id = myid;
end loop;
Worst USE of an SQL query every:
A SELECT query that that counts the number of lines corresponding to a certain condition, called in the stopping condition of a for loop.
Something like this:
for(int i = 0; i < query("SELECT COUNT .... WHERE ..."); i++)
{
}
And no, the result of the query doesn't change every iteration. Yes I realize the server is going to cache the result.
In a posting to the comp.databases.informix news group - a genuine working Informix table (which I do not recommend using):
CREATE TABLE VIEW
(
DECIMAL CHAR(30),
NOT INTEGER NOT NULL,
SERIAL DATE NOT NULL,
NULL CHAR(1) NOT NULL,
INTEGER DECIMAL(13,6) NOT NULL
);
It helps (marginally) if you know that SERIAL is a type in Informix databases - basically, one of the types for generating automatically allocated numbers serially.
I liked the one reposted recently on the dailywtf, the story that comes with it is wonderful as well.
A PL/SQL (Oracle) stored proc that sorted a result set using a Bubble Sort. It was discovered when I and the DBA were asked to figure out a severe performance problem. The developer, an Oracle "expert," had worked on it for over a week. He explained with a straight face that he learned about Bubble Sort in his computer science class. The algorithm is commonly used to illustrate poor performance.
Replaced the whole mess with an ORDER BY clause. Performance improved by several orders of magnitude.
I think this is the worst (especially followed by a painful and null rollback):
DROP DATABASE;
ROLLBACK;
SUBSTRING( (SUBSTRING(LastName, 0, CHARINDEX(' ', LastName)) + ', ' + FirstName), 0, CHARINDEX(' ', (SUBSTRING(LastName, 0, CHARINDEX(' ', LastName)) + ', ' + FirstName), LEN(LastName) + 3) )
they apparently weren't familiar with RTRIM;
RTRIM(LastName) + ', ' + RTRIM(FirstName)
SELECT name FROM categories WHERE id IN (".implode(",",corrected_cats($ad->id)).") ORDER BY name ASC
Yes, you're reading that right... comma separated fields in a field.