views:

300

answers:

5

I have a sqlite table containing records of variable length number prefixes. I want to be able to find the most complete prefix against another variable length number in the most efficient way:

eg. The table contains a column called prefix with the following numbers:

1. 1234
2. 12345
3. 123456

What would be an efficient sqlite query to find the second record as being the most complete match against 12345999.

Thanks.

+1  A: 
select foo, 1 quality from bar where foo like "123*"
union
select foo, 2 quality from bar where foo like "1234*"
order by quality desc limit 1

I haven't tested it, but the idea would work in other dialects of SQL

MatthewMartin
+1  A: 
thomas
+3  A: 

A neat trick here is to reverse a LIKE clause -- rather than saying

WHERE prefix LIKE '...something...'

as you would often do, turn the prefix into the pattern by appending a % to the end and comparing it to your input as the fixed string. Order by length of prefix descending, and pick the top 1 result.

I've never used Sqlite before, but just downloaded it and this works fine:

sqlite> CREATE TABLE whatever(prefix VARCHAR(100));
sqlite> INSERT INTO WHATEVER(prefix) VALUES ('1234');
sqlite> INSERT INTO WHATEVER(prefix) VALUES ('12345');
sqlite> INSERT INTO WHATEVER(prefix) VALUES ('123456');
sqlite> SELECT * FROM whatever WHERE '12345999' LIKE (prefix || '%') 
        ORDER BY length(prefix) DESC LIMIT 1;

output:

12345
Cowan
A very neat trick and works perfectly. Many thanks
mysomic
Follow up: I have been using this method in an iPhone app and have found it to be quite slow. I am on the hunt now for some optimizations and will post here when found."The GLOB and LIKE operators are expensive in SQLite because they can't make use of an index. One reason is that these are implemented by user functions which can be overridden, so the parser has no way of knowing how they might behave in that case. This forces a full scan of the table for the column being matched against, even if that column has an index."
mysomic
+1  A: 

Without resorting to a specialized index, the best performing strategy may be to hunt for the answer.

Issue a LIKE query for each possible prefix, starting with the longest. Stop once you get rows returned.

It's certainly not the prettiest way to achieve what you wan't but as opposed to the other suggestions, indexes will be considered by the query planner. As always, it depends on your actual data. In particular, on how many rows in your table, and how long the average hunt will be.

Anders Johannsen
A: 

Personally I use next method, it will use indexes:

statement '('1','12','123','1234','12345','123459','1234599','12345999','123459999')' should be generated by client

SELECT * FROM whatever WHERE prefix in
('1','12','123','1234','12345','123459','1234599','12345999','123459999')
ORDER BY length(prefix) DESC LIMIT 1;
HardQuestions
Also order by just prefix (without use of length) should give same result.
HardQuestions