views:

99

answers:

4

Using SQLite, Got a table with ~10 columns. Theres ~25million rows.

That table has an INDEX on 'sid, uid, area, type'.

I run a select like so:

SELECT sid from actions where uid=1234 and area=1 and type=2

That returns me 1571 results, and takes 4 minutes to complete.

Is that sane?

I'm far from an SQL expert, so hopefully someone can fill me in on what I'm missing. Why could this possibly take 4+ minutes with everything indexed?

Any recommended resources to learn about achieving high SQL performance? I feel like a lot of the Google results just give me opinions or anecdotes, I wouldn't mind a solid book.

+4  A: 

Create uid+area+type index instead, or uid+area+type+sid

zerkms
+2  A: 
  • The index is not really usefull as it does start with the wrong field... which means a table scan.

  • Looks like you have a normal computer there, not something made for databases. I run table scans over 650 million rows in about a minute on my lower end db server, but that means reading about a gigabyte per second from the discs, which are a RAID of 10k RM discs - RAID 10. Just to say that basically... that databases love IO, and that in a degree that you have never seen before. Basically larger db servers have many discs to satisfy the IOPS (IO per second) requirement. I have seen a server with 190 discs.

So, you ahve two choices: beed up your IOPS capability (means spending money), or set up indices that get used because they are "proper".

Proper means: an index only is usefull if the fields it contains are used from left to right. Not necessarily in the same order... but if a field is missed there is a chance the SQL System decides it is not worth pursuing the index and instead goes table scan (as in your case).

TomTom
Lots of great replies here thanks all. So I feel like my eyes just opened up real wide -- is it then fair to say that sometimes tables have multiple indexes? It seems that a single index represents a common search method. If I sometimes search by sid, and sometimes by uid+area+type, is it then very possible that its a good idea for me to have them as 2 seperate indexes? Thanks
TheImpact
Yes, sometimes a table will have a number of indices - totally normal. Last fallback is to have single field indices and have a server smart eenough to handle that efficiently (shiqh SqlLite is likely not).
TomTom
+2  A: 

Since the index starts with the sid column, it must do a scan (start at the beginning, read to the end) of either the index or the table to find your data matching the other 3 columns. This means it has to read all 25 million rows to find the answer. Even if it's reading just the rows of the index rather than the table, that's a lot of work.

Imagine a phone book of the greater New York metropolitan area, organized by (with an 'index' on) Last Name, First Name.

You submit SELECT [Last Name] FROM NewYorkPhoneBook WHERE [First Name] = 'Thelma'

It has to read all 25 million entries to find all those Thelmas. Unless you either specify the last name and can then turn directly to the page where that last name first appers (a seek), or have an index organized by First Name (a seek on the index followed by a seek on the table, aka a "bookmark lookup"), there's no way around it.

The index you would create to make your query faster is on uid, area, type. You could include sid, though leave it out if sid is part of the primary key.

Note: Tables often do have multiple indexes. Just note that the more indexes, the slower the write performance. Unnecessary indexes can slow overall performance, sometimes radically so. Testing and eventually experience will help guide you in this. Also, reasoning it out as a real-world problem (like my phone book examples) can really help. If it wouldn't make sense with phone books (and separate phone book indexes) then it probably won't make sense in the database.

One more thing: even if you put an index on those columns, if your query is going to end up pulling a great percentage of the rows in the main table, it will still be cheaper to scan the table rather than do the bookmark lookup (seek the index then seek the table for each row found). The exact "tipping point" of whether to do a bookmark lookup with a seek, or to do a table scan isn't something I can tell you off the top of my head, but it is based on solid math.

Emtucifor
A: 

When you create your new index on uid, area and type, you should also do a select distinct on each one to determine which has the fewest distinct entries, then create your index such that the fewer the differences the earlier they show up in the index definition.

Nathan Feger
Why is that? If uid, area, and type are always included together in where or join clauses then it makes no difference. If some subset is commonly used, then most important is to put the columns of that subset first, not to arrange things based on the selectivity. And last, if things were otherwise equal (say all three columns are used as single criteria about equally) wouldn't you want to put the most selective column first, not the least selective column first as you suggest?
Emtucifor