views:

85

answers:

2

What's a good way to work with many rows in MySql, given that I have a long list of keys in a client application that is connecting with ODBC?

Note: my experience is largely SQL Server, so I know a bit, just not MySQL specifically.

The task is to delete some rows from 9 tables, but I might have upwards of 5,000 key pairs.

I started out with the easy way of looping through all my keys and submitting a statement for each one against each table, such as:

DELETE FROM Table WHERE Key1 = 123 AND Key2 = 567 -- and 8 more tables
DELETE FROM Table WHERE Key1 = 124 AND Key2 = 568 -- and 8 more tables
DELETE FROM Table WHERE Key1 = 125 AND Key2 = 569 -- and 8 more tables
...

Except, that comes out to 45,000 separate statements, which as you can imagine is a bit slow.

So, without worrying about the programming language I'm using on the front end, what's a good way to submit the list so that I can JOIN and do the operation all at once or at least in large batches? Here are my ideas so far:

  • Create a temp table and insert to it, then join. I'll happily look up the syntax for MySQL to create a temp table, but is that a good route to go?

  • Assuming I do use a temp table, what's the best method for populating a temp table? 5000 INSERT Table VALUES () statements? SELECT 123, 456 UNION ALL SELECT 124, 457? I just tested that MySql allows this kind of SELECT that is not issued against a table. But SQL Server eventually blows up if the list gets too long, so is this a good way in MySQL? Should I just keep the list to a few hundred at once?

    --CREATE Temp Table ( I do not know the syntax in MySql yet)
    
    
    INSERT INTO TempTable
    SELECT 123, 456
    UNION ALL SELECT 124, 457
    UNION ALL SELECT 125, 458
    
    
    DELETE T
    FROM
       Table T
       INNER JOIN TempTable X ON T.Key1 = X.Key1 AND T.Key2 = X.Key2
    
  • XML. I see MySQL 5.1 has some XML functions, but from a cursory search it doesn't appear to support turning a chunk of XML text into a rowset to join against. Is that true? It is extremely easy for me to get the values into XML.

  • A virtual split operation. I presume in MySql that there's some kind of procedural language possible. In SQL Server I could write some custom code that parses a string and turns it into a rowset:

    CREATE PROCEDURE DoStuff @KeyString varchar(max)
    AS
    DECLARE @Keys TABLE (
       Key1 int,
       Key2 int,
       PRIMARY KEY CLUSTERED (Key1, Key2)
    )
    DECLARE @Pos int
    WHILE @Pos < Len(@KeyString) BEGIN
       -- loop to search for delimiting commas in @KeyString
       -- and insert pairs of parsed tokens to table variable @Keys
    END
    
    
    DELETE T
    FROM
       Table T
       INNER JOIN @Keys K ON T.Key1 = K.Key1 AND T.Key2 = K.Key2
    

Since I'm unfamiliar with MySQL, I really don't know which possibility to investigate first, and I would appreciate some help to save me from making a poor decision and/or learning the hard way.

A: 

Any way to refactor (or append to) the table so it has a single key? Something like thekey = key1 * 1000 + key2?

That way one could use

delete from table
  where thekey in (123567, 124568, 125569);

(follow up)

Okay, since the structure can't be changed, how about this?:

create view snort_maint as
  select id, key1 * 1000 + key2 as thekey from snort;

Then, for maintenance

delete from snort
  where id in (select id
               from snort_maint
               where thekey in (123567, 124568, 125569)
              );

According to the MySQL 5.5 view restrictions, deleting directly from a view should work:

delete
from snort_maint
where thekey in (123567, 124568, 125569)
wallyk
Nope, sorry. It's actually the snort sensor database, and it won't be changing any time soon.
Emtucifor
(after update) Wallyk, the combined value method you're suggesting is too much of a hack. The keys are unsigned integers, and there is no safe multiplier to use to "multiplex" them into one value. I've worked with SQL Server for long enough that I am forced to outright reject this poor practice method. There IS a better way, I just have to find it. Serioudsly—I'm always going on about how people use IN() and don't understand that it's semantically equivalent to an OR list, and that they should use JOINs instead. I simply couldn't do something like this and have pride left in my profession.
Emtucifor
The tag say MySQL, not SQLServer. It's hardly hackish given the constraints that the snort table structure can't be changed, and you want a one statement way of deleting rows.
wallyk
I'm asking for an *elegant* one-statement way. Are you telling me that the BEST possible way in MySql to accomplish this is your value-packing solution? Sql Server totally aside, I have a hard time believing that. Being unable to handle a composite key would instantly demote MySql from "respected DB engine" to "child's crappy toy." And value packing that will guarantee an overflow with large values surely cannot be good practice. Regardless of the db engine. It's simple programming sense. I really do appreciate you taking the time to try to help me out, though!
Emtucifor
I'm not sure why you think that's not elegant. It has the advantages of being concise, convenient to implement, and executing quite efficiently, except for potential range problems in the combined key. A 64-bit integer would probably work well depending on the actual data. But if not, using a string concatenation should address that: `concat (convert(key1,varchar), ' ', convert(key2,varchar)) as thekey`. Certainly it's cleaner than the various temp tables and all that you contemplated.
wallyk
Let's wait and see if others comment. Would love to know if I'm the only one who thinks this is clunky.
Emtucifor
+1  A: 

I would use the temp table solution, and join it to each main table in the DELETE statements. So you only have to do nine deletes, one for each table.

  • CREATE TEMPORARY TABLE

    CREATE TEMPORARY TABLE Keys (
        Key1 INT UNSIGNED NOT NULL, 
        Key2 INT UNSIGNED NOT NULL, 
        PRIMARY KEY(Key1, Key2)
    );
    
  • Load a file of tab-separated data into the temp table using LOAD DATA LOCAL INFILE

    LOAD DATA LOCAL INFILE 'C:/path/to/datafile' INTO TABLE Keys;
    
  • Delete using MySQL's multi-table DELETE syntax.

    DELETE t FROM Table1 t JOIN Keys USING (Key1, Key2);
    DELETE t FROM Table2 t JOIN Keys USING (Key1, Key2);
    DELETE t FROM Table3 t JOIN Keys USING (Key1, Key2);
    DELETE t FROM Table4 t JOIN Keys USING (Key1, Key2);
    DELETE t FROM Table5 t JOIN Keys USING (Key1, Key2);
    DELETE t FROM Table6 t JOIN Keys USING (Key1, Key2);
    DELETE t FROM Table7 t JOIN Keys USING (Key1, Key2);
    DELETE t FROM Table8 t JOIN Keys USING (Key1, Key2);
    DELETE t FROM Table9 t JOIN Keys USING (Key1, Key2);
    

Re your comment:

The MySQL docs on CREATE TABLE say the following:

A TEMPORARY table is visible only to the current connection, and is dropped automatically when the connection is closed. This means that two different connections can use the same temporary table name without conflicting with each other or with an existing non-TEMPORARY table of the same name. (The existing table is hidden until the temporary table is dropped.)

That's pretty clear!

Regarding loading the data, you could just do it with INSERT. 5000 rows is no big deal. I use a PHP script to load millions of rows (e.g. the StackOverflow XML data dump) into MySQL and that only takes about 20 minutes. I use a prepared statement and then execute it with parameters.

Bill Karwin
Thanks Bill, this looks like dynamite. I will look into the LOAD DATA method, though I'm not sure that I'll be able to write to such a file easily. Is there another script way to load the temp table? Also, could you elaborate just a little on the scope of temp tables (what happens if two sessions try to create a temp table with the same name, or the name already exists as a real table, and is the temp table destroyed when the connection closes)?
Emtucifor
P.S. USING is cool, I wish SQL Server had that.
Emtucifor
I wish I could give you more than one upvote. Thank you for the depth and detail of your answer. I admit I could have gone and researched all of these things (and I plan to do it still) but a nice round answer is so great.
Emtucifor
I wish you could too, because your upvote brought me to 44,999! :-) Cheers!
Bill Karwin