views:

54

answers:

2

I've written a stored procedure in MySQL to take values currently in a table and to "Normalize" them. This means that for each value passed to the stored procedure, it checks whether the value is already in the table. If it is, then it stores the id of that row in a variable. If the value is not in the table, it stores the newly inserted value's id. The stored procedure then takes the id's and inserts them into a table which is equivalent to the original de-normailized table, but this table is fully normalized and consists of mainly foreign keys.

My problem with this design is that the stored procedure takes approximately 10ms or so to return, which is too long when you're trying to work through some 10million records. My suspicion is that the performance is to do with the way in which I'm doing the inserts. i.e.

INSERT INTO TableA 
 (first_value) 
VALUES 
 (argument_from_sp) ON DUPLICATE KEY UPDATE id=LAST_INSERT_ID(id);

SET @TableAId = LAST_INSERT_ID();

The "ON DUPLICATE KEY UPDATE" is a bit of a hack, due to the fact that on a duplicate key I don't want to update anything but rather just return the id value of the row. If you miss this step though, the LAST_INSERT_ID() function returns the wrong value when you're trying to run the "SET ..." statement.

Does anyone know of a better way to do this in MySQL?

A: 

I've gone back and created a function to handle this case instead:

CREATE DEFINER=`root`@`%` FUNCTION `value_update`(inValue VARCHAR(255)) RETURNS int(11)
BEGIN
        DECLARE outId INT;
        SELECT valueId INTO outId FROM ValuesTable WHERE value = inValue;

        IF outId IS NULL THEN
                INSERT INTO ValuesTable (value) VALUES (inValue);
                SELECT LAST_INSERT_ID() INTO outId;
        END IF;

        RETURN outId;
END

The stored procedure mentioned earlier calls these functions instead of doing the INSERT statements itself. Performance-wise, the above function is faster in my setup (using ndb table type). Also, after benchmarking all of the parts of my application I've found that the performance issues this was causing were only a minor part of the overall performance bottleneck.

srkiNZ84
A: 

If you already have a unique identifier, is there any need to have an auto-incrementing primary key?

Brian Hooper
The unique identifier would otherwise be a VARCHAR field. I would prefer an integer field due to performance reasons.
srkiNZ84
The unique identifier still is a varchar field; all you have done is add another column and another unique index to the table. The integer unique identifier serves no purpose and the best that can be said of it is that it won't slow things down very much.Benchmarking all parts of your application was a good idea, and will allow you to concentrate on the things that matter.
Brian Hooper