views:

291

answers:

2

Yesterday I had a question where people suggested I use Levenshtein method. Is it a slow query? Maybe I can use something else?

+1  A: 

You can use the BENCHMARK function to test the performance:

SELECT BENCHMARK(10000, LEVENSHTEIN('abc', 'abd'));

Maybe test it with different strings similar to your use case.

nikic
LEVENSHTEIN is not an integrated MySQL function. It is User Defined Function. You need to write it in C. Read the provided link to another related StackOverflow question.
FractalizeR
A: 

As per my answer on your original question - if you want it perform well then normalize your schema

The problem is that in order to determine how similar other data is, the DBMS has to load that data and compare it with the datum. So it has to read through every single row in the table (except the current one) to find 'similar' values. It cannot use indexes to find data which is close to the datum.

If, on the other hand you used a schema like this:

CREATE TABLE member (
   member_id      INT(11),
   member_data    CLOB,
   PRIMARY KEY (member_id));

CREATE TABLE about_member (
   member_id      INT(11),
   metric         VARCHAR(10),
   value          MEDIUMINT(9),
   PRIMARY KEY (member_id, metric),
   KEY by_value (metric, value, member_id));

Note that you about_member (1-1-2-2-1) string should be implemented as seperate rows, e.g.

 member_id     metric      value
 1234          lost        2
 1234          won         3
 1234          drawn       1
 1234          handicap    9

Then you can use the indexes affectively, e.g.

 SELECT compare.member_id, SUM(ABS(compare.value-datum.value)) AS difference
 FROM about_member compare, about_member datum
 WHERE datum.member_id=$MEMBER_TO_COMPARE
 AND compare.member_id<>datum.member_id
 AND compare.metric=datum.metric
 AND compare.metric BETWEEN (datum.metric-1) AND (datum.metric+1) /* tweak here */
 GROUP BY compare.member_id;

C.

symcbean