views:

3268

answers:

6

Is it particularly bad to have a very, very large SQL query with lots of (potentially redundant) WHERE clauses?

For example, here's a query I've generated from my web application with everything turned off, which should be the largest possible query for this program to generate:

SELECT * FROM 4e_magic_items INNER JOIN 4e_magic_item_levels ON 4e_magic_items.id = 4e_magic_item_levels.itemid INNER JOIN 4e_monster_sources ON 4e_magic_items.source = 4e_monster_sources.id WHERE (itemlevel BETWEEN 1 AND 30)  AND source!=16 AND source!=2 AND source!=5 AND source!=13 AND source!=15 AND source!=3 AND source!=4 AND source!=12 AND source!=7 AND source!=14 AND source!=11 AND source!=10 AND source!=8 AND source!=1 AND source!=6 AND source!=9  AND type!='Arms' AND type!='Feet' AND type!='Hands' AND type!='Head' AND type!='Neck' AND type!='Orb' AND type!='Potion' AND type!='Ring' AND type!='Rod' AND type!='Staff' AND type!='Symbol' AND type!='Waist' AND type!='Wand' AND type!='Wondrous Item' AND type!='Alchemical Item' AND type!='Elixir' AND type!='Reagent' AND type!='Whetstone' AND type!='Other Consumable' AND type!='Companion' AND type!='Mount' AND (type!='Armor' OR (false )) AND (type!='Weapon' OR (false )) ORDER BY type ASC, itemlevel ASC, name ASC

It seems to work well enough, but it's also not particularly high traffic (a few hundred hits a day or so), and I wonder if it would be worth the effort to try and optimize the queries to remove redundancies and such.

A: 

Most databases support stored procedures to avoid this issue. If your code is fast enough to execute and easy to read, you don't want to have to change it in order to get the compile time down.

An alternative is to use prepared statements so you get the hit only once per client connection and then pass in only the parameters for each call

Oskar
+2  A: 

http://dev.mysql.com/doc/refman/5.0/en/packet-too-large.html

Configurable with a limit of 1GB in 5.0

Chris
+5  A: 

Reading your query makes me want to play an RPG.

This is definitely not too long. As long as they are well formatted, I'd say a practical limit is about 100 lines. After that, you're better off breaking subqueries into views just to keep your eyes from crossing.

I've worked with some queries that are 1000+ lines, and that's hard to debug.

By the way, may I suggest a reformatted version? This is mostly to demonstrate the importance of formatting; I trust this will be easier to understand.

select *  
from
  4e_magic_items mi
 ,4e_magic_item_levels mil
 ,4e_monster_sources ms
where mi.id = mil.itemid
  and mi.source = ms.id
  and itemlevel between 1 and 30
  and source not in(16,2,5,13,15,3,4,12,7,14,11,10,8,1,6,9)  
  and type not in(
                  'Arms' ,'Feet' ,'Hands' ,'Head' ,'Neck' ,'Orb' ,
                  'Potion' ,'Ring' ,'Rod' ,'Staff' ,'Symbol' ,'Waist' ,
                  'Wand' ,'Wondrous Item' ,'Alchemical Item' ,'Elixir' ,
                  'Reagent' ,'Whetstone' ,'Other Consumable' ,'Companion' ,
                  'Mount'
                 )
  and ((type != 'Armor') or (false))
  and ((type != 'Weapon') or (false))
order by
  type asc
 ,itemlevel asc
 ,name asc

/*
Some thoughts:
==============
0 - Formatting really matters, in SQL even more than most languages.
1 - consider selecting only the columns you need, not "*"
2 - use of table aliases makes it short & clear ("MI", "MIL" in my example)
3 - joins in the WHERE clause will un-clutter your FROM clause
4 - use NOT IN for long lists
5 - logically, the last two lines can be added to the "type not in" section.
    I'm not sure why you have the "or false", but I'll assume some good reason
    and leave them here.
*/
JosephStyons
Actually joins will speed things up, especially with proper indices. The reason is that if all your clauses are in WHERE, mysql will fetch all data, and then filter it; whereas with a proper join it will select only the needed data, which can be some orders of magnitude smaller - faster to filter.
Aeon
Oh, and...Unless I'm missing something, (type!='Armor' OR (false)) is going to evaluate to either true, or false, but in either case it's not going to affect the result set, so it's really not even needed.
Aeon
That's because armor (and weapons) are further filtered by type. So for example, if Cloth and Hide are selected, that would read: (type!='Armor' OR (FALSE OR restrictions like 'C' OR restrictions like 'H')) The page just adds "or Restrictions like 'whatever'" inside the parens, so false is needed.
Asmor
joins in the FROM clause will unclutter your WHERE clause. @Aeon - there should be no performance difference with a proper query optimizer. Is mysql that bad?
David B
JosephStyons
downvote for cluttering the WHERE clause with JOIN conditions.
longneck
A: 

I'm assuming you mean by 'turned off' that a field doesn't have a value?

Instead of checking if something is not this, and it's also not that etc. can't you just check if the field is null? Or set the field to 'off', and check if type or whatever equals 'off'.

mrinject
+1  A: 

From a practical perspective, I generally consider any SELECT that ends up taking more than 10 lines to write (putting each clause/condition on a separate line) to be too long to easily maintain. At this point, it should probably be done as a stored procedure of some sort, or I should try to find a better way to express the same concept--possibly by creating an intermediate table to capture some relationship I seem to be frequently querying.

Your mileage may vary, and there are some exceptionally long queries that have a good reason to be. But my rule of thumb is 10 lines.

Example (mildly improper SQL):

SELECT x, y, z
FROM a, b
WHERE fiz = 1
AND foo = 2
AND a.x = b.y
AND b.z IN (SELECT q, r, s, t
            FROM c, d, e
            WHERE c.q = d.r
              AND d.s = e.t
              AND c.gar IS NOT NULL)
ORDER BY b.gonk

This is probably too large; optimizing, however, would depend largely on context.

Just remember, the longer and more complex the query, the harder it's going to be to maintain.

Keith B
A: 

When you and try to use the website, does it seem slow? If there's only a few hundred hits a day, I guess you could not worry about it.

Do you expect traffic to increase? By how much? If you aren't hard pressed for time, you could just do it, to future-proof the site.

But then, is the time it takes to programatically find and remove redundancies greater than the time it take to just run the query?

Matt Blaine