views:

382

answers:

3

I am trying to compare a series of strings like the following

rodeo     rodas
carrot    crate
GLX       GLX 1.1
GLX       glxs

the comparision need not be case sensitive i am trying to write a sql where i am updating the first string with the second string if they match approximately. Here except the second string all the other examples match. I would like to write a query which updates the strings except the second one.

is this possible directly in a query in ACCESS

thanks

+2  A: 

Other than using LIKE, this is not possible directly in a query in MS Access. You will need to implement some sort of Levenshtein Distance function for approximate matches.

Mitch Wheat
How does this fare in performance? I've implemented the algorithm in C#, and it appears to be pretty hairy.
+1  A: 

You cannot measure the distance of two strings directly in Access. You have to implement a suitable metric which calculates the distance--there are many different ways to calculate it.

Edit: You may want to watch this Wikipedia entry which describes the possibilities very good. The Levenshtein Distance Mitch recommends is a very good and simple solution for solving this.

arno
+1  A: 

What you are trying to do is called fuzzy matching and it is quite complicated to code -- refer http://search.cpan.org/dist/String-Approx/Approx.pm

I tried doing this with a combination of SQL and VBA functions and gave up because there were so many exceptions. The best I could come up with in the end was the following:

(1) Used strawberry perl to connect to the ms-access db using the win32 perl module;

(2) Ran sql queries from perl to obtain the two connection strings;

(3) Used perl's String::Approx module to do the fuzzy string comparison -- this returns 1 for a perfect match and a decreasing number less than 1 for an imperfect match.

(4) Appended the fuzzy match number back into ms-access via a sql query.

heferav