views:

1438

answers:

6

We have a full text index on a fairly large table of 633,569 records. The index is rebuilt from scratch as part of a maintenance plan every evening, after a bunch of DTS packages run that delete / insert records. Large chunks of data are deleted, then inserted (to take care of updates and inserts), so incremental indexing is not a possibility. Changing the packages to only delete when necessary is not a possibility either as it is a legacy application that will eventually be replaced.

The FTI includes two columns - one a varchar(50) not null and a varchar(255) null.

There is a clustered index on the primary key column, which is just an identity column. There is also an combined index on an integer column and the varchar(50) column mentioned above. This latter index was added for performance reasons.

The problem is that the re-indexing is painfully slow - about 8 hours.

The server is fairly robust (dual processor, 4gb of ram), and everything runs quickly beyond this re-indexing.

Any tips on how to speed this up?

UPDATE

Our client has access to the sql box. Turns out they turned on change tracking on the table that is part of the full text index. We turned this off, and the full population took less than 3 hours. Still not great, but better than 8.

UPDATE 2

The FTI is again taking ~8 hours to populate.

+1  A: 
  • Do you have enough RAM?

  • What are your file drive placements in terms of RAID configuration?

  • Are you seeing high tempDB activity?

(BTW, half a million records is not large; it's not even medium... ;) )

Mitch Wheat
4gb of ramI'll check on the raid - what are the repurcussions here?temp db - I'll also check on that.
ScottE
RAID 5, with the data/log on one partition, and the FTI on another partition.
ScottE
so you are saying that the data and log file are on the same RAID 5 partition?
Mitch Wheat
Yes. And to clarify, our system admin gave me the incorrect information. The FTI is actually on a different drive, with RAID 1. The data/log files are on a different drive with RAID 5.
ScottE
A: 

Try putting the index on a separate physical disk than the database.

EDIT: Scott reports this is already the case.

Joshua
@Joshua: I wouldn't do that until I had actually identified the problem...
Mitch Wheat
A: 

Is the system offline whilst you are doing the reindex or live ?

Are these the only items in your full text catalog; if not you might want to consider separating them out from the remainder of your FTS data. (Might help with monitoring too) In the index is the identity column configured as the unique key ?

Can you quantify the large amounts of changes? There are 3 basic options for repopulation; You might want to try switching to full or incremental as one may suit you better than the one you are using now. In my experience incremental works well if changes to the total DB are less than 40% (had a similar issue during large data take ons into the database.) If >40% change then full is likely better (from my experience - i index documents so it might work differently for you) The third option you might want to consider try the Change Tracking with scheduled update reindex option.

If you can take the server off-line to users then what performance settings do you have FTS running under whilst reindexing? You can check this Full-Text Search Service Properties / Performance tab - System Resource Usage as a slider (think there are 4 or 5 positions). There is probably a system proc to change this dont know it and dont have a 2000 machine to check anymore.

FTS / Reindexing loves ram and lots of it; the general rule of thumb is have virtual memory 3x the physical memory; if you have several physical disks then create several Pagefile.sys files, so that each Pagefile.sys file will be placed on its own physical disk. Are you on NT or Windows 2000 ? check that extended memory over 2gb is actually configured properly.

u07ch
- It's a web app, so the system has to remain online. However, if being offline sped up the process, to say, 15 minutes, then we could justify taking it offline in the middle of the night
ScottE
Changes could approach 100% - the table is populated from a back-office system that can't mark items as updated, so delete [re]insert is required
ScottE
We're running windows 2000. And yes, it's the only table in the catalog. I do have to check the drives again - I have a feeling the catalog may be sitting on a partition, not a physical drive.
ScottE
If you have a test box; it might be worth trying adjusting the performance settings in Full Text search; but my gut says its not going to be 15 minutes. How fragmented are your tables / indices generally? And as a second question how fragmented are the windows files disks? Deleting 650k Records each night must put some stress on things. Where are you seeing stress during the reindex; memory/ disk/cpu ? It might be worth considering drip feeding the table and incremental reindexing through that process
u07ch
We do have a test box, but unfortunately it is MS-SQL 2005. I just ran a full rebuild, and it took about 2 minutes. I checked the index fragmentation on the live box, and it is definitely getting fragmented, so I'll clean that up tonight and see if it helps. Any SQL 2000 specific issues that you can otherwise think of?
ScottE
Have you checked the memory setup on the box and the virtual memory? FTS reindex is performant dependent on memory. Check physical fragmentation of the files if they are bad i would start there and work up. Assuming the test and live boxes are similar (allowing for sql) then they their performance should be magnitudes closer than at present so watch the server as it runs on both boxes - disks, cpu and memory in perfmon. Hilary Cotter wrote an article which may http://msdn.microsoft.com/en-us/library/aa175787(SQL.80).aspx on SQL 2000 FTS performance which might help you move this forwrd.
u07ch
Ok, I rebuilt the indices, move the FTI to a faster drive, and ran the full population on a test database. It ran in about 2 hours, which is much better than 9. This ran on similar hardware on our development machine (MS-SQL 2005) in about 2 minutes. Would physical fragmentation account for that kind of discrepency, or was 2000 that much slower??
ScottE
How fragmented does windows say teh disk is; moving it to the "faster" one i would assume it would be less fragmented. Check memory usage between the two boxes.
u07ch
Ok, I rebuild the indices on the production server - they were quite fragmented. So, like in any experiment I'll only change one variable at a time. I'll check on the execution time of the FTI tomorrow.
ScottE
Well, the indices were rebuilt, and the population still took about 8 hours.
ScottE
A: 

Disallowing nulls in the column that currently does might not speed up the index, but in my experience is a better practice, especially for indexing purposes. The only columns I can justify allowing nulls in are date columns.

marr75
A: 

Here is a checklist of parameters for FT-indexing performance on SQL Server. Most of them are already quoted and checked here. I don't find one of them on your comments though:

The SQL Server MAX SERVER MEMORY setting should be set manually (dynamic memory allocation is turned off) so that enough virtual memory is left for the Full-Text Search service to run. To achieve this, select a MAX SERVER MEMORY setting that once set, leaves enough virtual memory so that the Full-Text Search service is able to access an amount of virtual memory equal to 1.5 times the amount of physical RAM in the server. This will take some trial and error to achieve this setting.

streetpc
+3  A: 

SQL Server's indexing is slow primarily because of its asynchronous data extraction scheme.

  • Use change tracking with the "update index in background" option.

The easiest way to improve the performance of full-text indexing is to use change tracking with the "update index in background" option.When you index a table (FTI, like "standard" SQL indexes, works on a per-table basis), you specify full population, incremental population, or change tracking. When you opt for full population, every row in the table you're full-text indexing is extracted and indexed. This is a two-step process.

First, you (or Enterprise Manager) run this system stored procedure:

sp_fulltext_getdata CatalogID, object_id

After all the results sets of all of the timestamps and PK values are returned to MSSearch, MSSearch will issue another sp_fulltext_getdata, but this time, once for every row in your table.So if you have 50 million rows in your database, this procedure will be issued 50 million times.

On the other hand, if you use an incremental population, MSSearch will issue an initial:

sp_fulltext_getdata CatalogID, object_id

for each row in the table that you're full-text indexing. So if you have 50 million rows in your database, this statement will also be issued 50 million times. Why? Because even with an incremental population, MSSearch must figure out exactly which rows have been changed, updated, and deleted. Another problem with incremental populations is that they'll index or re-index a row even if the change was made to a column that you aren't full-text indexing.

Although an incremental population is generally faster than a full population, you can see that for large tables, either will be time-consuming.

I recommend you enable change tracking with background or scheduled updating. If you do, you'll see that MSSearch will first issue another:

sp_fulltext_getdata CatalogID, object_id

for every row in the table with change tracking enabled.Then, for every row that has a column that you're full-text indexing and that's modified after your initial full population, the row information will be written (in the database you're indexing) to the sysfulltextnotify table. MSSearch will then issue the following only for the rows that apear in this table and will then remove them from the sysfulltextnotify table.

  • Consider using a separate build server

Tables that are heavily updated while you're indexing can create locking problems, so if you can live with a catalog that's periodically out of date and an MSSearch engine that's sometimes unavailable consider using a separate build server. You do this by making sure the indexing server has a copy of the table to be full-text indexed and exporting the catalog .Clearly, if you need real-time or near real-time updates to your catalog, this is not a good solution

  • Limit activity when population is running

When population is running, don't run Profiler, and limit other database activity as much as possible. Profiler consumes significant resources.

  • Increase the number of threads for the indexing process

Increase the number of threads you're running for the indexing process. The default is only five, and on quads or 8-ways, you can bump this up to much higher values. MSSearch will, however, throttle itself if it's slurping too much data from SQL Server, so avoid doing this on single- or dual-processor systems.

  • Stop any anti-virus or open file-agent backup software.

If this is not possible, try to prevent them from scanning the temporary directories being used by SQL FTI and the catalog directories

  • Place the catalog,temp directory and pagefiles on their own controllers

If you can make that investment.Place the catalog on its own controller, preferably on a RAID-1 array.Place the temp directory on a RAID-1 array. Similarly, consider putting pagefile on its own RAID-1 array with its own controller.

  • Consider creating secondary data files for the Temp DB - 1 per CPU / Core.

Hope this helps.

Raj

Raj
Thank you for the thorough response. I'll work through some of this soon and report back. Still not sure how this could take less than 5 minutes on sql 2005 with less hardware.
ScottE
In SQL 2005 there is a dedicated indexing service that works directly with SQL Serrver. This speeds up full-text operations and isolates SQL Server from changes to the search service made by other applications.Re-architecture of the full-text gathering mechanism and improvements in index merge strategy improved indexing performance in 2005. For example, on the same hardware, with the same data set, building a full-text index on 20 million rows of character-based text data took roughly 14 days in SQL Server 2000, while in SQL Server 2005, the same index required less than 10 hours.
Raj