Yesterday I had a question where people suggested I use Levenshtein method. Is it a slow query? Maybe I can use something else?
+1
A:
You can use the BENCHMARK
function to test the performance:
SELECT BENCHMARK(10000, LEVENSHTEIN('abc', 'abd'));
Maybe test it with different strings similar to your use case.
nikic
2010-08-09 09:50:34
LEVENSHTEIN is not an integrated MySQL function. It is User Defined Function. You need to write it in C. Read the provided link to another related StackOverflow question.
FractalizeR
2010-08-09 13:40:32
A:
As per my answer on your original question - if you want it perform well then normalize your schema
The problem is that in order to determine how similar other data is, the DBMS has to load that data and compare it with the datum. So it has to read through every single row in the table (except the current one) to find 'similar' values. It cannot use indexes to find data which is close to the datum.
If, on the other hand you used a schema like this:
CREATE TABLE member (
member_id INT(11),
member_data CLOB,
PRIMARY KEY (member_id));
CREATE TABLE about_member (
member_id INT(11),
metric VARCHAR(10),
value MEDIUMINT(9),
PRIMARY KEY (member_id, metric),
KEY by_value (metric, value, member_id));
Note that you about_member (1-1-2-2-1) string should be implemented as seperate rows, e.g.
member_id metric value
1234 lost 2
1234 won 3
1234 drawn 1
1234 handicap 9
Then you can use the indexes affectively, e.g.
SELECT compare.member_id, SUM(ABS(compare.value-datum.value)) AS difference
FROM about_member compare, about_member datum
WHERE datum.member_id=$MEMBER_TO_COMPARE
AND compare.member_id<>datum.member_id
AND compare.metric=datum.metric
AND compare.metric BETWEEN (datum.metric-1) AND (datum.metric+1) /* tweak here */
GROUP BY compare.member_id;
C.
symcbean
2010-08-09 10:32:41