views:

215

answers:

4

In an MMORPG server I am refactoring, I have two tables. One for items, and one for spells. Each item has up to 5 spells, so I went with a sparse-matrix format, having 5 columns for spell IDs.

The original devisers of this structure chose to use MyISAM which does not support referencing, causing the items table to contain items with non-existant spell IDs. I wish to find out which items have incorrect spell IDs in order to fix them and perhaps in the long run convert to InnoDB.

So far I've been able to come up with only this:

SELECT COUNT(*)
  FROM items
 WHERE spellid_1 NOT IN (SELECT entry FROM research.spell)
    OR spellid_2 NOT IN (SELECT entry FROM research.spell)
    OR spellid_3 NOT IN (SELECT entry FROM research.spell)
    OR spellid_4 NOT IN (SELECT entry FROM research.spell)
    OR spellid_5 NOT IN (SELECT entry FROM research.spell);

Is there a more elegant way to do so?

EDIT: NULL spellid_n counts as valid since it just means the item doesn't have a spell in that slot.

+2  A: 

It would have been more elegant to design the tables so that you didn't have 5 spellid columns in the same table - i.e by having an item_spell table that would allow any number of spells per item. Apart from being more future-proof (when you find you now need 6 spells), your query would become:

SELECT COUNT(DISTINCT item_id)
  FROM item_spells
 WHERE spell_id NOT IN (SELECT entry FROM research.spell);

As it is, you are forced to perform the check 5 times.

Tony Andrews
True, but there will never be more than 5 spells per item, and the common use-case is to fetch an item in it's entirety so I still believe a sparse matrix better fits this.
Spidey
@Spidey Unlike water, specs seldom freeze. I would suggest following Tony's advice to normalize and future-proof.
Justin Johnson
Voted down. Doesn't answer the question.
hobodave
A: 

The normalization step suggested would be useful (to have a connection table for the many-to-many item-spell relation). A disadvantage of the denormalized version is that spells of items have an implicit ordering, we always have to deal with all of them, when for example checking if an item have a specific spell or not.

However, the storage engine optimizes the long sql with the 5 identical subquery, it shouldn't cause performance problems. An alternative phrasing would be, using the SQL '99 standard 'with' clause:

WITH   spellids(entry) AS SELECT entry FROM research.spell
SELECT COUNT(*)  
FROM   items
WHERE     spellid_1 NOT IN spellids OR spellid_2 NOT IN spellids
       OR spellid_3 NOT IN spellids OR spellid_4 NOT IN spellids
       OR spellid_5 NOT IN spellids ;

Not much shorter, and unfortunately MySQL does not support the 'with' clause (see this question) yet.

csaba
+1  A: 

Spidey, good question. Try the following:

SELECT  COUNT(*)
FROM items i
LEFT JOIN research.spell spell1 ON i.spellid_1 = spell1.entry
LEFT JOIN research.spell spell2 ON i.spellid_2 = spell2.entry
LEFT JOIN research.spell spell3 ON i.spellid_3 = spell3.entry
LEFT JOIN research.spell spell4 ON i.spellid_4 = spell4.entry
LEFT JOIN research.spell spell5 ON i.spellid_5 = spell5.entry
WHERE spell1.entry IS NULL 
OR spell2.entry IS NULL 
OR spell3.entry IS NULL
OR spell4.entry IS NULL
OR spell5.entry IS NULL

The key here is that you want to LEFT JOIN your research.spell table, so that it includes items that don't have a corresponding row for the given JOIN condition. Then you filter that tableset where the right side of the join IS NULL. This gives you rows from the left side table (items) with no corresponding row in the right side table (research.spell).

EDIT:

Also note, that I left your initial SELECT COUNT(*) unchanged. This will give you the total number of items that have 1 or more invalid spells. You'll need to change this to SELECT i.id or something similar to fetch the IDs of items that have invalid spells.

hobodave
This gets close to what I'm after. I updated the items table because the previous DB maintainers thought using a mix of 0 and -1 instead of NULL was better. However, this query also returns items with NULL spellid_n columns. Seems like I've neglected to mention I don't consider those as invalid, for which I apologize.
Spidey
A: 
Mladen Prajdic
Error 1242: Subquery returns more than 1 row. I can't see how this will work anyways, since it's checking if the bigger group is in the small group..
Spidey
It took MySQL 5:53 minutes to return using the new query. Opposed to my original query's 3 seconds, I think something's wrong
Spidey
lol. ok then it's definitely not useful :)
Mladen Prajdic