views:

22

answers:

1

I am trying to work out the best approach for a data migration.

I am migrating some data (~8000 rows) from a table like this:

CREATE TABLE location (
    location_id INT NOT NULL AUTO_INCREMENT UNIQUE PRIMARY KEY,
    addr VARCHAR(1000) NOT NULL,
    longitude FLOAT(11),
    latitude FLOAT(11)
) Engine = InnoDB, DEFAULT CHARSET=UTF8;

to a table like this:

CREATE TABLE location2 (
    location_id INT NOT NULL AUTO_INCREMENT UNIQUE PRIMARY KEY,
    addr VARCHAR(255) NOT NULL UNIQUE,
    longitude FLOAT(11),
    latitude FLOAT(11)
) Engine = InnoDB, DEFAULT CHARSET=UTF8;

It is not important to preserve the primary key.

The addresses in "location" are duplicated many times. In most cases with the same latitude and longitude. But in SOME cases there are rows with the same value for addr but DIFFERENT values for latitude and longitude.

The final location2 table should have one entry for each unique addr entry in location. Where there is more than one possible value for latitude/longitude the latest (highest location_id) should be used.

I created a procedure to do this but it doesn't like the rows where addr is the same but latitude/longitude are different.

DROP PROCEDURE IF EXISTS migratelocation;
DELIMITER $$
CREATE PROCEDURE migratelocation()
BEGIN
    DECLARE done INT DEFAULT 0;
    DECLARE a VARCHAR(255);
    DECLARE b, c FLOAT(11);
    DECLARE cur CURSOR FOR SELECT DISTINCT addr, latitude, longitude FROM location;
    DECLARE CONTINUE HANDLER FOR NOT FOUND SET done = 1;
    OPEN cur;
    REPEAT
        FETCH cur INTO a, b, c;
        IF NOT done THEN
            INSERT INTO location2 (addr, latitude, longitude) VALUES (a, b, c);
        END IF;
    UNTIL done END REPEAT;
    CLOSE cur;
END $$
DELIMITER ;
CALL migratelocation();

Is there a good way to do this? I keep wanting to give up and write a little PHP prog to do it but I'd rather learn the right SQL way if I can.

Possibly I just need to find the right SELECT from the first table and I can use:

INSERT INTO location2 SELECT ... ;

to migrate the data.

Thanks!

+4  A: 

You could use INSERT IGNORE directly, or REPLACE - I'm assuming that this is a one-off process, or at least one where performance is not a major consideration.

In this case the record with the highest location_id wins:

INSERT IGNORE
INTO   location2
SELECT *
FROM   location
ORDER BY
       location_id DESC

Subsequent records wit the same primary key value are just discarded by the insert.

You'd need strict SQL mode to be disabled, otherwise the truncation of the addr field will give errors.

martin clayton
This does exactly what the man says. Many thanks.
Thomas David Baker