views:

61

answers:

2

On DBD::SQLite of SQLite3

If I am going to query a SELECT only once.

Should I CREATE a INDEX first and then query the SELECT
 or
just query the SELECT without an INDEX,
 which is faster ?

If need to be specified, the col. to be index on is a INTEGER of undef or 1, just these 2 possibilities.

+3  A: 

Building an index takes longer than just doing a table scan. So, if your single query — which you're only running once — is just a table scan, adding an index will be slower.

However, if your single query is not just a table scan, adding the index may be faster. For example, without an index, the database may perform a join as many table scans, once for each joined row. Then the index would probably be faster.

I'd say to benchmark it, but that sounds silly for a one-off query that you're only ever going to run once.

derobert
Since without an INDEX, SELECT will do a table scan, using a new INDEX will be slower at a time for saving that index to database.
Galaxy
A table scan is usually O(n); indexes are generally trees, so building the index can be no faster than O(n) + O(n*log(n)) [n = number of rows].
derobert
A: 

If you consider setting and index on a column that only has two possible values it's not worth the effort as index will give very little improvement. Indexes are useful on a columns that has a high degree of uniqueness and are frequently queried for a certain value or range. On the other hard indexes make inserting and updating slower so in this case you should skip it.

RaYell
An index may be worth it for a column with only two values. For example, maybe you have a table with a large number of rows, and a boolean flag "does this need processing?". Assuming most things do not need processing, then an index will likely speed up the "find stuff that needs processing" query substantially. Generally, indices work well finding rare values, regardless of the overall cardinality of the set. (A very high cardinality is a special case where *all* values are rare)
derobert
So if I want to SELECT a col. of 2 possibilities for many times (about 100 k), should I build a INDEX ?
Galaxy
... and rare is defined as less than 1–10%, depending on a lot of factors, such as how expensive random access to the table is vs. sequential access.
derobert
@Galaxy: Only in the special case that the value you're looking for is rare.
derobert
@Galaxy: If your values are split 50/50, or anywhere close to that, they are not rare, so an index probably won't help. If we're talking 99/1 or somesuch, then an index will probably help the 1% side, and should just be ignored for the 99% side. But it'll cost you on insert/update/delete instead. Since you're selecting the entire table, I wonder if the fastest approach might be to use GROUP BY (and only one query), but I think that it'd be best if you open another question describing what exactly you're doing, especially since I'm going to bed and won't read any more comments 'til morning :_D
derobert
@derobert: What I ask here is a rare situation and the half/half situation is another topic that I just find out after reading your comment above. Since the half/half one need to query many times (about 100 k on either side as mentioned in my deleted comment), I wonder whether it need an INDEX. You said about 10/90 will make a INDEX useful, so I think INDEX should be useless. (but in the 50/50 one, each value comes every second line, which makes a worst situations in the factor you metioned)
Galaxy