tags:

views:

38

answers:

2

I have a mySQL database with a tad under 2 million rows. The database is non-interactive, so efficiency isn't key.

The (simplified) structure I have is:

`id` int(11) NOT NULL auto_increment
  `category` varchar(64) NOT NULL
  `productListing` varchar(256) NOT NULL

Now the problem I would like to solve is, I want to find duplicates on productListing field, merge the data on the category field into a single result - deleting the duplicates.

So given the following data:

+----+-----------+---------------------------+
| id | category  | productListing            |
+----+-----------+---------------------------+
|  1 | Category1 | productGroup1             | 
|  2 | Category2 | productGroup1             | 
|  3 | Category3 | anotherGroup9             | 
+----+-----------+---------------------------+

What I want to end up is with:

+----+----------------------+---------------------------+
| id | category             | productListing            |
+----+----------------------+---------------------------+
|  1 | Category1,Category2  | productGroup1             | 
|  3 | Category3            | anotherGroup9             | 
+----+----------------------+---------------------------+

What's the most efficient way to do this either in pure mySQL query or php?

+2  A: 

I think you're looking for GROUP_CONCAT:

SELECT GROUP_CONCAT(category), productListing
FROM YourTable
GROUP BY productListing

I would create a new table, inserting the updated values, delete the old one and rename the new table to the old one's name:

CREATE TABLE new_YourTable SELECT GROUP_CONCAT(...;
DROP TABLE YourTable;
RENAME TABLE new_YourTable TO YourTable;
-- don't forget to add triggers, indexes, foreign keys, etc. to new table
soulmerge
This seems to be doing what I want, I guess the next question is - is storing this type of data in a comma separated list the right way to go about things? I'm thinking it may make more sense to create a new table, but unless there is a benefit I won't really bother.
Michael Pasqualone
Storing the data in a comma seperated list is not the right way to go. A column should have one value. Otherwise updates would be very inefficient.
ovais.tariq
A: 
SELECT  MIN(id), GROUP_CONCAT(category SEPARATOR ',' ORDER BY id), productListing
FROM    mytable
GROUP BY
        productListing
Quassnoi