views:

272

answers:

4

I'm using MySQL 4.1. Some tables have duplicates entries that go against the constraints.

When I try to group rows, MySQL doesn't recognise the rows as being similar.

Example:

Table A has a column "Name" with the Unique proprety.
The table contains one row with the name 'Hach?' and one row with the same name but a square at the end instead of the '?' (which I can't reproduce in this textfield)
A "Group by" on these 2 rows return 2 separate rows

This cause several problems including the fact that I can't export and reimport the database. On reimporting an error mentions that a Insert has failed because it violates a constraint.

In theory I could try to import, wait for the first error, fix the import script and the original DB, and repeat. In pratice, that would take forever.

Is there a way to list all the anomalies or force the database to recheck constraints (and list all the values/rows that go against them) ?

I can supply the .MYD file if it can be helpful.

+2  A: 

To list all the anomalies:

SELECT name, count(*) FROM TableA GROUP BY name HAVING count(*) > 1;

There are a few ways to tackle deleting the dups and your path will depend heavily on the number of dups you have.

See this SO question for ways of removing those from your table.

Here is the solution I provided there:

-- Setup for example
create table people (fname varchar(10), lname varchar(10));

insert into people values ('Bob', 'Newhart');
insert into people values ('Bob', 'Newhart');
insert into people values ('Bill', 'Cosby');
insert into people values ('Jim', 'Gaffigan');
insert into people values ('Jim', 'Gaffigan');
insert into people values ('Adam', 'Sandler');

-- Show table with duplicates
select * from people;

-- Create table with one version of each duplicate record
create table dups as 
    select distinct fname, lname, count(*) 
    from people group by fname, lname 
    having count(*) > 1;

-- Delete all matching duplicate records
delete people from people inner join dups 
on people.fname = dups.fname AND 
   people.lname = dups.lname;

-- Insert single record of each dup back into table
insert into people select fname, lname from dups;

-- Show Fixed table
select * from people;
RC
The first query doesn't return anything. 2 tables have this problem. Doesn't work on either or them. In the one the duplicate I can identify there a caracter that is different. Yet when it tries to insert, MYSQL finds that it's the same caracter. I'll update my description.
Silence
Have you checked to make sure you don't have white-space within your column values? I'm guessing you have a value like 'Bob' and one like 'Bob '.
RC
It's not whitespaces.
Silence
A: 

Create a new table, select all rows and group by the unique key (in the example column name) and insert in the new table.

crunchdog
Hmm, how do one get the ungrouped values...
crunchdog
"Group by" doesn't work. I edited my question.
Silence
A: 

To find out what is that character, do the following query:

SELECT HEX(Name) FROM TableName WHERE Name LIKE 'Hach%'

You will se the ascii code of that 'square'.

If that character is 'x', you could update like this:(but if that column is Unique you will have some errors)

UPDATE TableName SET Name=TRIM(TRAILING 'x' FROM Name);
True Soft
Yes thank you. Unfortunalty I have hundreds of case like that and only one of them begins with "Hach". How do I find them all ?
Silence
But is the last character (the 'square') the same?
True Soft
The square is not always the last character and there can be more than 1. They seems to be replacing accents in words : the language is french and a lot of words have accents. On the other hand, some accents are not replaced by "squares".
Silence
A: 

I'll assume this is a MySQL 4.1 random bug. Somes values are just changing on their own for no particular reason even if they violates some MySQL constraints. MySQL is simply ignoring those violations.

To solve my problem, I will write a prog that tries to resinsert every line of data in the same table (to be precise : another table with the same caracteristics) and log every instance of failures.

I will leave the incident open for a while in case someone gets the same problem and someone else finds a more practical solution.

Silence