ansaurus

Question

DB Index speed vs caching

Answer 1

+3 A:

10k rows is not much... start caring when you reach 500k ~ 1 million rows.

Indexes do a great job, specially if you just have 10 different values for that index.

Seb 2009-03-26 19:53:12

Answer 2

+4 A:

For a DB, 10K rows is nothing. You're not seeing much difference because the actual calculation time is minimal, with most of it consumed by other, constant, overhead.

It's difficult to predict when you'd start noticing a difference, but it would probably be at around a million rows.

If you've already set up caching and it's not detrimental, you may as well leave it in.

sysrqb 2009-03-26 19:55:26

Answer 3

+2 A:

This depends on numerous factors - the amount of memory your DB has, the size of the rows in the table, use of a parameterised query and so forth, but generally 10K is not a lot of rows and particularly if the table is well indexed then it's not going to cause any modern RDBMS any sweat at all.

As a rule of thumb I would generally only start paying close attention to performance issues on a table when it passes the 100K rows mark, and 500K doesn't usually cause much of a problem if indexed correctly and accessed by such. Performance usually tends to fall off catastrophically on large tables - you may be fine on 500K rows but crawling on 600K - but you have a long way to go before you are at all likely to hit such problems.

Cruachan 2009-03-26 20:04:02

Answer 4

+1 A:

Your query in 10K rows most probably uses HASH SORT UNIQUE.

As 10K most probably fit into db_buffers and hash_area_size, all operations are performed in memory, and you won't note any difference.

But if the query will be used as a part of a more complex query, or will be swapped out by other data, you may need disk I/O to access the data, which will slow your query down.

Run your query in a loop in several sessions (as many sessions as there will be users connected), and see how it performs in that case.

Quassnoi 2009-03-26 20:11:17

Answer 5

+2 A:

Is the index doing all the work?

You can tell how the query is being executed by viewing the execution plan.

For example, try this:

explain plan for select distinct field from table;

select * from table(dbms_xplan.display);

I notice that you didn't include an ORDER BY on that. If you do not include ORDER BY then the order of the result set may be random, particularly if oracle uses the HASH algorithm for making a distinct list. You ought to check that.

So I'd look at the execution plans for the original query that you think is using an index, and at the one based on the cache table. Maybe post them and we can comment on what's really going on.

Incidentaly, the cache table would usually be implemented as a materialised view, particularly if the master table is generally pretty static.

David Aldridge 2009-03-26 20:18:56

+1 to examine the execution plan

eduncan911 2009-03-26 21:09:28

Answer 6

+2 A:

Serious premature optimization. Just let the database do its job, maybe with some tweaking to the configuration (especially if it's MySQL, which has several cache types and settings).

le dorfier 2009-03-26 20:21:39

Answer 7

A:

If you have an index on the column, then all the values are in the index and the dbms never has to look in the table. It just looks in the index which just has 10 entries. If this is mostly read only data, then cache it in memory. Caching helps scalability and a lot by relieving the database of work. A query that is quick on a database with no users, might perform poorly if a 30 queries are going on at the same time.

BeWarned 2009-03-26 20:25:47

-1: A b-tree index doesn't have ten entries -- it has one entry for every non-null row, and each entry includes the rowid.

David Aldridge 2009-03-26 23:51:23

Answer 8

A:

For future plans and for scalability, you may want to look into an indexing service that uses pure memory or something faster than the TCP DB round-trip. A lot of people (including myself) use Lucene to achieve this by normalizing the data into flat files.

Lucene has a built-in Ram Drive directory indexer, which can build the index all in memory - removing the dependency on the file system, and greatly increasing speed.

Lately, I've architected systems that have a single Ram drive index wrapped by a Webservice. Then, I have my Ajax-like dropdowns query into that Webservice for high availability and high speed - no db layer, no file system, just pure memory and if remote tcp packet speed.

eduncan911 2009-03-26 21:13:29

ansaurus

tags:

views:

answers:

DB Index speed vs caching

related questions