ansaurus

Question

SQL to detect similar records in the same database table

Answer 1

A:

Here is an example..try changing it to your needs.

SELECT email, 
 COUNT(email) AS NumOccurrences
FROM users
GROUP BY email
HAVING ( COUNT(email) > 1 )

Misnomer 2010-10-12 21:54:22

@Misnomer: thanks for the example, however in this example it would only work on exact duplicates. I need to check for similar records that may not be exactly the same.

BradB 2010-10-12 22:20:15

you could add to the having clause another condition `OR email LIKE '%similar%' ` to check for similiar items..

Misnomer 2010-10-12 22:30:25

@Misnomer: I plan to use FTS as the LIKE operator isn't sophisticated enough for my requirements. Have you ever used an FTS JOIN in the style of your example? Do-able?

BradB 2010-10-12 23:01:25

Answer 2

A:

You may want to look into the MERGE statement that is new in SQL Server 2008. See, for example: Inserting, Updating, and Deleting Data by Using MERGE.

Joe Stefanelli 2010-10-12 21:56:10

Answer 3

A:

you can write a sproc and schedule a maintenance plan to run, or you can use embedded c# code on sql server, so you can build better algorithms easly in db side with c#. or you can write a windows service for a batch processing job that can run regulary.

sirmak 2010-10-12 21:58:06

Answer 4

A:

Databases are really good at dealing with distinct pieces of information. They are not so good at dealing with quasi-distinct information.

With that said, see if the soundex function works (well enough) for grouping similar inputs.

And, for the love of god, don't use anything like this in a production environment.

JoshRoss 2010-10-13 01:13:32

ansaurus

tags:

views:

answers:

SQL to detect similar records in the same database table

related questions