views:

54

answers:

3

I feel like a bit of a newbie posting this, but anyway:

I have a large number of stock items, 3000-5000 which have complex names, based on whoever has entered the items over a period of 16 years, and example of a name is:

"Food, Dog, Pal Meaty Bites chunks 8kg bag"

Another, related item is named:

"DOG FOOD: Meaty Bites (Pal) 22kg bag"

The problem is that I have lists of items from a number of suppliers, with updated prices, which I need to match to our existing stock list. The first time I get a list, I want to try and do a "closest match" search and present the user with a list of our current stock item names, which might match the suppliers stock item name. The user will then choose the correct SKU and the app will import the supplier item and link to our Stock table PK.

The name from the supplier will also vary. An example is:

"Pal Meaty Bites Chunks 8kg"

I can do the match in SQL or .NET, which ever you recommend. I want to present the user with as few items as possible based on greatest number of keywords match. My ideas so far are:

In .Net: break into an array and search each keyword for each item (slow) In SQL: Use a full text index and split the name into keywords using "OR" return list on rank with cutoff

This must be a common scenario, I'm just not sure of the best way to do it. Thanks for your input!

Edit: Added some context: We have a SKU table which has about 20 fields, including StockKeepingUnitID, which is the unique PK (int identity). The suppliers products are pulled into a table called StockOrderUnit, which has a FK of SupplierID and StockKeepingUnitID, and has a field called SupplierCode (varchar) which contains the Suppliers unique code for that stock item. The problem is that numerous suppliers send us price lists and it is up to a user to match the supplier items (that are unknown at this point) to the existing SKU's in the DB already. Once they select one, the records are joined.

+2  A: 

Definitely take this back to client code, rather than in DB. This will allow you to, as you say, create a score of matches, and allow the user to choose/confirm your automated matches.

I'd tackle it by splitting into an array, converting to lower case, and then sorting alphabetically. Perhaps try moving the terms with numbers in them to the front of the array. Pull it all back into a string to help the user recognize matches with a bit of consistency.

I'd hesitate to do this automated, and without user supervision, in a SQL script. Perhaps users could be given a score, and only have to adjudicate those under some threshold.

p.campbell
Thanks. This is what I had concluded. It would not be automated completely, just list hopefully 5-10 best matches for a user to pick. If they have to search, they wont do it!The reason I was thinking back to SQL is to take advantage of the Full Text Index, but would still have to clean up the results in .NEt anyway I guess. Thanks for your answer.
Molloch
+1  A: 

One can use the SQL keyword LIKE to do searches like this.

select fld1, fld2 from ProductTable where fld1 LIKE '%Meaty Bites%';

Pardon me if you already have one, but if you do not have a SKU (Stock Keeping Unit) system for the love of mankind create one. At a minimum create a unique primary key that auto-increments (identity) and apply it to all your records. Then use that to do lookups etc.

Using 'like', far less records will come across the wire and you don't have to write a bunch of code to do the work.

JustBoo
Thanks. I was just be brief for the sake of the post. We have a SKU table which has about 20 fields, including StockKeepingUnitID, which is the unique PK int identity.The suppliers products are pulled into a table called StockOrderUnit, which has a FK of SupplierID and StockKeepingUnitID, and has a field called SupplierCode (varchar) which contains the Suppliers unique code for that stock item.The problem is that numerous suppliers send us price lists and it is up to a user to match the supplier items (that are unknown at this point) to the existing SKU's in the DB already.
Molloch
@Molloch Gotcha'. Keep fighting the good fight. :-) Remember "like" returns a result-set not just a single record.
JustBoo
+1  A: 

You could take both of your approaches.
Split and do some basic matching in SQL.
Then score the results in .Net

Your basic matching in SQL could be as simple as a large list of all things that match a number of words.

Then your scoring in .Net is where the real "magic" would happen.

Jean-Bernard Pellerin