views:

534

answers:

3

The problem if have is this.

We have taken over a website which has an active member community. We've been given the application and database dump and have the site running on a new server successfully and the DNS has been switched.

The problem is that the database has come out of sync in the time it took to get the files to us and the DNS switched over. Now that the DNS has switched and there is no chance of the database going out of sync, we've been handed members2 which is the table from the original server with the extra data.

Both tables look like this

`idmembers` int(10) unsigned NOT NULL auto_increment,
`firstName` varchar(20) default NOT NULL,
`lastName` varchar(20) default NOT NULL,
`email` varchar(255) default NOT NULL,
`date` varchar(10) default '0',
`source` varchar(50) default 'signup'
 PRIMARY KEY  (`idmembers`),
 UNIQUE KEY `email` (`email`)

So the first table is called members1 and is the live database, which is missing a load of members from members2. I need to merge them both together keeping members1 as it is and allowing unique emails from members2 to be inserted into members1.

I am presuming that there is some SQL to do this but I have no idea what it could be.

My second and less preferable approach would be to use a tool like PhpMyAdmin to export all the records from members2 after a certain date and reimport them into members1 but the problem is they all export from members2 with an idmembers that conflict with members1 (as an autoincrement is used in both)

A: 

Just write a quick porting script which SELECTS the fields that are missing from "members1" and then does an INSERT for each one in the "members2" table.

You might have to do some checking if you require a unique email address, and you think there might be duplicates.

jjclarkson
+1  A: 

The most important suggestion is to do this on a copy of your database, not the live database, until you are certain the process results in the correcting merging!

First you should check to see if there are any rows in members2 with duplicate email addresses that already exist in members1:

SELECT members2.*
FROM members1 JOIN members2 USING (email);

If there are any (hopefully it'll be few), fix them up manually, or delete each row that is really a duplicate account of a person who already has an account in members1 (keep backup data of course).

If there's any other cases of redundant member accounts that should be considered duplicates and not inserted as new members, you may have to handle that manually. This is an example of a broader problem of database cleanup or de-duping that usually can't be automated fully.

You can copy rows from members2 into members1 while generating new id values like this:

INSERT INTO members1 (`firstName`, `lastName`, `email`, `date`, `source`)
  SELECT `firstName`, `lastName`, `email`, `date`, `source`
  FROM members2;

Yes, you have to name all the columns. By omitting idmembers from that query, that column will use its default behavior which is to generate a new id value.

You didn't say you need to update other tables that reference these new members by their id. If so, you should create a new table to map the members2 id to the new number generated as you import them into members1. You'll have to follow @ijclarkson's advice of inserting the members one at a time, so you can note the new id generated.

SELECT * FROM members2;

-- loop over results in a script:

  INSERT INTO members1 (`firstName`, `lastName`, `email`, `date`, `source`)
    VALUES (?, ?, ?, ?, ?);

  INSERT INTO members_id_map (idmembers1, idmembers2)
    VALUES (LAST_INSERT_ID(), ?); -- use idmembers from the query on members2

-- end loop
Bill Karwin
+1  A: 

If I understand your question correctly, there are two separate issues here:

  1. Adding completely new member records from members2 into members1
  2. Updating the email field in members1, if that got changed in members2

As for the first case, you should be able do something like:

INSERT INTO members1 ('idmembers', 'firstname', etc.)
 SELECT 'idmembers', 'firstname', etc.
  FROM members2
  WHERE idmembers NOT IN (SELECT idmembers FROM members1)

As for the second case, something like:

UPDATE members1 m1 LEFT JOIN members2 m2
 ON m1.idmembers = m2.idmembers
 SET m1.idmembers = m2.idmembers
 WHERE m2.idmembers IS NOT NULL AND m2.idmembers != m1.idmembers

(Note1: Both statements constructed 'ad hoc' and untested!)
(Note2: Both statements assume that the primary key 'idmember' did not change during migration of members1! If that happened, these queries will not work.)
(Note3: If you encounter the 'different idmember keys' problem from Note2, you can still use the queries, but change the comparison and join operations to use the email field. But then you'd have to execute the second query first to prevent duplicates)

Henrik Opel