tags:

views:

147

answers:

3

I have a text column containing URLs that are uniqe.

I need to do a SQL pattern match query, say using the 'like' SQL operator.

In such a scenario, will SQLite do a FTS ( Full Table Scan ) regardless of whether I have an index on that column or not ( column is a primary key )?

It seems to be that its doing a FTS as the speed of operation is telling - or is the speed impact due to the 'like' query although the column is indexed?

+2  A: 

The LIKE operator will perform a full table scan if operand string begins with a wildcard character ("LIKE '%foo'"). It will use an index (if available) otherwise.

BipedalShark
+1  A: 

If your LIKE uses a wildcard at the beginning of the match expression, such as '%hris McCall', indexes will not be used.

Chris McCall
Unfortunately, the wildcard '%' is used as a suffix - AFTER a constant text - something like 'FTP%'What is amazing is that I just finished a little bit of data analysis and it looks like most of the "pattern" strings are constants - like 'ftp://abs.com' - so there was no need to use like at all!Just for the sake of it, I wrote a one liner that calls a version of the query that uses '=' instead of 'like' when there are no wildcards, and it's wayyy faster - '=' takes 3 seconds on what takes 'like' 42 minutes!Is there no way to make 'like' call '=' if no wildcards are used?
PoorLuzer
Why do you use SQLite for a project that has enough data to take 42 minutes? That must be hundreds of GB of data.
erikkallen
... or maybe not hundreds, but probably tens.
erikkallen
No! This is for db that is ~12MB.Note the difference in timings in using 'like' vs '='
PoorLuzer
42 minutes on a 12MB database? This is a little hard to believe. Would you mind sharing the queries and table schema?
BipedalShark
Things do not seem to be working out - I need to share the database - but the problem is that the values are extremely confidential - would sharing the schema and PERL code with embedded queries be enough?Don't you need the complete 12MB db to do timings properly?
PoorLuzer
+1  A: 

The SQLite Query Planner

4.0 The LIKE optimization

Terms that are composed of the LIKE or GLOB operator can sometimes be used to constrain indices. There are many conditions on this use:

  1. The left-hand side of the LIKE or GLOB operator must be the name of an indexed column.
  2. The right-hand side of the LIKE or GLOB must be a string literal that does not begin with a wildcard character. [...]

Update: by default case_sensitive_like mode is OFF. Turning it ON might make it behave like =.

PRAGMA case_sensitive_like = ON; -- OFF
SELECT * FROM your_table WHERE field LIKE '...'
Nick D
the left hadn side of like is the primary key and the right is a constant string followed sometimes by a wildcard.As I said, most of the "pattern" strings are constants - like 'ftp:abs.com' - so there was no need to use like at all! Just for the sake of it, I wrote a one liner that calls a version of the query that uses '=' instead of 'like' when there are no wildcards, and it's wayyy faster - '=' takes 3 seconds on what takes 'like' 42 minutes! Is there no way to make 'like' call '=' if no wildcards are used?
PoorLuzer
@PoorLuzer, check out my update.
Nick D
.. I will have a look today and let you know.
PoorLuzer
no difference in timing at all - I need to share the databse - but the problem is that the values are extremely confidential - would sharing the schema and PERL code be enough?
PoorLuzer