tags:

views:

617

answers:

2

I use both Firebird embedded and Firebird Server, and from time to time I need to reindex the tables using a procedure like the following:

CREATE PROCEDURE MAINTENANCE_SELECTIVITY 
ASDECLARE VARIABLE S VARCHAR(200);
BEGIN
FOR select RDB$INDEX_NAME FROM RDB$INDICES INTO :S DO
BEGIN
S = 'SET statistics INDEX ' || s || ';';
EXECUTE STATEMENT :s;
END
SUSPEND;
END

I guess this is normal using embedded, but is it really needed using a server? Is there a way to configure the server to do it automatically when required or periodically?

+6  A: 

First, let me point out that I'm no Firebird expert, so I'm answering on the basis of how SQL Server works.

In that case, the answer is both yes, and no.

The indexes are of course updated on SQL Server, in the sense that if you insert a new row, all indexes for that table will contain that row, so it will be found. So basically, you don't need to keep reindexing the tables for that part to work. That's the "no" part.

The problem, however, is not with the index, but with the statistics. You're saying that you need to reindex the tables, but then you show code that manipulates statistics, and that's why I'm answering.

The short answer is that statistics goes slowly out of whack as time goes by. They might not deteriorate to a point where they're unusable, but they will deteriorate down from the perfect level they're in when you recreate/recalculate them. That's the "yes" part.

The main problem with stale statistics is that if the distribution of the keys in the indexes changes drastically, the statistics might not pick that up right away, and thus the query optimizer will pick the wrong indexes, based on the old, stale, statistics data it has on hand.

For instance, let's say one of your indexes has statistics that says that the keys are clumped together in one end of the value space (for instance, int-column with lots of 0's and 1's). Then you insert lots and lots of rows with values that make this index contain values spread out over the entire spectrum.

If you now do a query that uses a join from another table, on a column with low selectivity (also lots of 0's and 1's) against the table with this index of yours, the query optimizer might deduce that this index is good, since it will fetch many rows that will be used at the same time (they're on the same data page).

However, since the data has changed, it'll jump all over the index to find the relevant pieces, and thus not be so good after all.

After recalculating the statistics, the query optimizer might see that this index is sub-optimal for this query, and pick another index instead, which is more suited.

Basically, you need to recalculate the statistics periodically if your data is in flux. If your data rarely changes, you probably don't need to do it very often, but I would still add a maintenance job with some regularity that does this.

As for whether or not it is possible to ask Firebird to do it on its own, then again, I'm on thin ice, but I suspect there is. In SQL Server you can set up maintenance jobs that does this, on a schedule, and at the very least you should be able to kick off a batch file from the Windows scheduler to do something like it.

Lasse V. Karlsen
+1 perfect explanation
idursun
Exactly, what I wanted to say is that actually recalculating the stats can be key when a big number of data is introduced, and my concern is that Firebird doesn't recalculate them automatically, and there's no (obvious) way to do it with a job like in SQL Server.
pablo
Then I guess a batch file that calls the right command line tools and executes the SQL statements or sprocs you need is the only choice you have, unless you feel adventurous, and can make a tool yourself :)
Lasse V. Karlsen
+3  A: 

That does not reindex, it recomputes weights for indexes, which are used by optimizer to select most optimal index. You don't need to do that unless index size changes a lot. If you create the index before you add data, you need to do the recalculation.

Embedded and Server should have exactly same functionality apart the process model.

Harriv
You're right. But anyway, it does have a huge impact on performance.So, you mean none of them will do it automatically?
pablo
Index is updated when data is modified, but weight is not. So you need to recalculate it manually if data changes a lot.
Harriv