views:

72

answers:

2

The naive way of doing this that comes to mind would be:

SELECT name, lev FROM
(SELECT name, levenshtein(name, *parameter*) as lev FROM my_table)
WHERE 
lev = (SELECT MIN(lev) FROM 
(SELECT name, levenshtein(name, *parameter*) as lev FROM my_table ));

However the "(SELECT name, levenshtein(name, parameter) as lev FROM my_table)" subquery, which is very expensive (huge table) is repeated twice which seems horribly inefficient.

I somehow though you could write :

SELECT name, lev FROM
(SELECT name, levenshtein(name, *parameter*) as lev FROM my_table) as my_temp_table
WHERE 
lev = (SELECT MIN(lev) FROM my_temp_table);

But it doesn't seem to work.

Is there a clean way to optimize that query for speed? Did I miss something obvious?

Do I have to rely on temporary tables? (trying to avoid it due to the overhead/complexity as they don't seem appropriate for very frequent/concurrent queries)

Any input from SQL ninjas would be greatly appreciated ;)

+1  A: 
SELECT name, levenshtein(name, *parameter*) as lev
FROM my_table
ORDER BY lev ASC
LIMIT 1
chaos
Thanks for your answer, but I see couple of issues with this solution. First, my understanding is that even with a LIMIT, a full ORDER BY will be significantly slower than just computing a MIN (at least, it should in terms of algorithms). Second, it doesn't really give us all the MIN matching rows, only the first N (assuming LIMIT N) which could be a problem... :/
Vermillon
The first point probably depends on what you call 'significant', but shrug. As to the second, yeah, I was thinking the minimum would be unique, for some reason. I don't really have anything better than temporary tables for you, then.
chaos
(Well, you want to use heap tables, obviously: http://dev.mysql.com/doc/refman/5.0/en/memory-storage-engine.html)
chaos
Though if I were you, I don't think I'd mess with the temp tables; I'd do something similar to my query, with a higher limit that seemed likely to be "as many as I'd ever care about", and cull the irrelevant rows at the program level.
chaos
Temp tables seem to be pretty cumbersome/messy indeed. I'll probably try your approach with some splitting of the dataset prior to querying in order to alleviate the load, unless some super sql-ninja gives us a better idea. Thanks.
Vermillon
A: 

select name, min(levenshtein(name, *parameter)) as lev from my_table group by name;

Jeremy
That is not going to work.
chaos
And it indeed doesn't produce the desired result... :/
Vermillon