tags:

views:

1794

answers:

4

What is the best Fuzzy Matching Algorithm (Fuzzy Logic, N-Gram, Levenstein, Soundex ....,) to process more than 100000 records in less time?

+4  A: 

I suggest you read the articles by Navarro mentioned here: http://en.wikipedia.org/wiki/Fuzzy_string_searching Making your decision based on actual research is always better than on suggestions by random strangers.. Especially if performance on a known set of records is important to you.

Tim
A: 

It massively depends on your data. Certain records can be matched better than others. For example postcode is a defined format so can be compared in a different way to normal strings. People can be matched on initials and DOB, or other combinations etc.

ck
A: 

Go to www.matchlogics.com

A: 

do yourself a favor and buy an Nvidia Video card that supports cuda (like a GTX 260 based card) for around $150.

About CUDA, GPU processing: http://www.nvidia.com/object/cuda_home.html

Take your algorithm and use the CUDA .NET wrapper http://www.gass-ltd.co.il/en/products/cuda.net/Releases.aspx if you use .net

OR

download the cuda sdk for C++ etc...

convert any code you have already to cuda and watch the nice speed increase wich has been anywhere from 50% to 1000% faster because it works in parallel, basically you can process the entire matrix as one calculation taking 1 clock cycle as opposed to many on a traditional cpu. so long as your algorithm is half way decent, believe me this will blow your mind.