views:

17

answers:

1

I got a table with columns: author firstname, author lastname, and booktitle Multiple users are inserting in the database, through an import, and I'd like to avoid duplicates. So I'm trying to do something like this: I have a record in the db: First Name: "Isaac" Last Name: "Assimov" Title: "I, Robot"

If the user tries to add it again, it would be basically a non-split-text (would not be split up into author firstname, author lastname, and booktitle) So it would basically look like this: "Isaac Asimov - I Robot" or "Asimov, Isaac - I Robot" or "I Robot by Isaac Asimov"

You see where I am getting at?

(I cannot force the user to split up all the books into into author firstname, author lastname, and booktitle, and I don't even like the idea to force the user, because it's not too user-friendly)

What is the best way (in SQL) to compare all this possible bookdata scenarios to what I have in the database, not to add the same book twice. I was thinking about a possibility of suggesting the user: "is THIS the book you are trying to add?" (imagine a list instead of the THIS word, just like on stackoverflow - ask question - Related Questions.

I was thinking about

 soundex

and maybe even the

 like

operators, but so far i didn't get the results i was hoping.

A: 

You can implement significantly better algorithms for fuzzy matching than soundex/difference, take a look at Beyond SoundEx - Functions for Fuzzy Searching in MS SQL Server.

You could also look at implementing a Full Text catalog and using the "search engine" style FREETEXT() which:

Is a predicate used in a WHERE clause to search columns containing character-based data types for values that match the meaning and not just the exact wording of the words in the search condition

Depending on what your doing you could also perhaps use an ISBN web service to get normalized data.

Alex K.