views:

164

answers:

3

I am attempting to create a mysql snippet that will analyse a table and remove duplicate entries (duplicates are based on two fields not entire record)

I have the following code that works when I hard code the variables in the queries, but when I take them out and put them as variables I get mysql errors, below is the script

SET @tblname = 'mytable';
SET @fieldname = 'myfield';
SET @concat1 = 'checkfield1';
SET @concat2 = 'checkfield2';

ALTER TABLE @tblname ADD `tmpcheck` VARCHAR( 255 ) NOT NULL;

UPDATE @tblname SET `tmpcheck` = CONCAT(@concat1,'-',@concat2);

CREATE TEMPORARY TABLE `tmp_table` (
`tmpfield` VARCHAR( 100 ) NOT NULL
) ENGINE = MYISAM ;

INSERT INTO `tmp_table` (`tmpfield`) SELECT @fieldname FROM @tblname GROUP BY `tmpcheck` HAVING ( COUNT(`tmpcheck`) > 1 );

DELETE FROM @tblname WHERE @fieldname IN (SELECT `tmpfield` FROM `tmp_table`);

ALTER TABLE @tblname DROP `tmpcheck`;

I am getting the following error:

#1064 - You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near '@tblname ADD `tmpcheck` VARCHAR( 255 ) NOT NULL' at line 1 

Is this because I can't use a variable for a table name? What else could be wrong or how wopuld I get around this issue.

Thanks in adavnce

+1  A: 

Using a variable for the table name is indeed illegal. You'll have to generate the SQL as a string and use the prepared statement feature to execute it.

Matti Virkkunen
I have now tried PREPARE stmt_name FROM "ALTER TABLE ? ADD `tmpcheck` VARCHAR( 255 ) NOT NULL";SET @tblname = 'mytable';EXECUTE stmt_name USING @tblname;DEALLOCATE PREPARE stmt_name;But get an error at the '?'
Lizard
The same rule about not being able to use variables for table names likely goes for statement parameters. Concatenate the table name in with CONCAT() instead of making it a parameter.
Matti Virkkunen
+1  A: 

Is this because I can't use a variable for a table name?

Yes, or for other schema names like columns. String variables can only be used where MySQL expects a '-quoted string.

If you really need to do this you can with ‘dynamic SQL’: creating your whole query as a string, concatenating the @tblname into the string at that time, and executing the lot using EXECUTE. This is pretty ugly and can lead to SQL-injection if you're not careful, so avoid it if there is any other option.

SELECT myfield FROM mytable GROUP BY tmpcheck HAVING ( COUNT(tmpcheck) > 1 )

This seems problematic to me. Unless myfield has a functional dependency on tmpcheck (which AFAICS it can't, as tmpcheck is not a primary key), that's not valid ANSI SQL. MySQL would let you get away with it, but what you would be saying is “for each group of rows sharing a value of tmpcheck, pick the fieldname from one row out of that group at random for later deletion”. Is that really what you want? I would expect you to want to delete all but one of the duplicates.

Normally you shouldn't need this kind of complicated procedure to remove duplicates. Just use a DELETE-join:

DELETE my0
FROM mytable AS my0
JOIN mytable AS my1
    ON my1.checkfield1=my0.checkfield1 AND my1.checkfield2=my0.checkfield2
    AND my1.id>my0.id;

This is assuming an id field that is orderable and UNIQUE so that you can decide which row gets to stay (here, the one with the highest id). myfield might be that field, but I can't tell from context.

bobince
A: 

I have used a combination of both answers:

SET @tblname = 'myTable';
SET @idfield = 'myPrimaryKey';
SET @check1 = 'field1';
SET @check2 = 'field2';

SET @q1 = CONCAT('DELETE my0 FROM `',@tblname, '` AS my0 JOIN `',@tblname, '` AS my1 ON my1.',@check1,' = my0.',@check1,' AND my1.',@check2,' = my0.',@check2,' AND my1.',@idfield,' > my0.',@idfield,'');
PREPARE stmt1 FROM @q1;
EXECUTE stmt1;
DEALLOCATE PREPARE stmt1;
Lizard