ansaurus

Question

Speed up Oracle Text indexing or let the indexer work only on low load times

Answer 1

A:

We finally figured out how to do a splitted sync of the index. Here are some basic steps that show what we did:

CREATE INDEX concat_DM_RV_idx ON DOCMETA (FULLTEXTIDX_DUMMY)
INDEXTYPE IS CTXSYS.CONTEXT
PARAMETERS ('datastore concat_DM_RV_DS section group CTXSYS.AUTO_SECTION_GROUP
NOPOPULATE
');

see the NOPOPULATE parameter? that says the indexer that it shouldn't start the populating / indexing process. If you're on 11g you now have a very nice CTX_DDL feature at hand that populates the index at will, namely the procedure "POPULATE_PENDING". Calling it on your index name will populate the CTXSYS table that holds rows that have been modified and therefore are out of sync. Note that after calling this method the indexer still hasn't started anything. Since 10g (?) the according CTX_DDL.SYNC_INDEX procedure has several additional parameters, e.g. the "maxtime" parameter. Provide it with, say, 4H and your indexer will start to sync pending rows for about 4 hours. You repeat that procedure by schedule and are done.

That doesn't work in 9i unfortunately. So we tried successfully to "simulate" the Oracle POPULATE_PENDING process. The only restriction on this method is: you need some kind of unique row identifier to be able to query chunks of the same content from your table. Here's what we did:

1.) Create the index with NOPOPULATE (see above) 2.) Become SYS / DBA / CTXSYS (yes, you might call your admin for that). Find out the ID that your freshly created index has by querying the index meta table:

SELECT IDX_ID FROM CTXSYS.CTX_INDEXES WHERE IDX_NAME ='concat_DM_RV_idx';

3.) note the index ID this is yielding on a yellow snippet of paper and execute this insertion statement as CTXSYS role and replace the <> with your index id and the <> with the name of the table that the index is built on. The unique row identifer can be some kind of document ID or any kind of countable statement that creates a unique chunk of data of your table :

INSERT INTO CTXSYS.DR$PENDING (PND_CID,PND_PID,PND_ROWID,PND_TIMESTAMP)
SELECT <<your index id>>, 0, <<basetable name>>.ROWID, CURRENT_DATE
FROM gsms.DOCMETA
WHERE <<basetable unique row identifier>> < 50000;
COMMIT; -- Dont forget the COMMIT! DONT FORGET IT!!! WE MEAN IT!

The "50.000" marks the number of rows depending on the scarceness of your basetabel that'll be inserted in the pending rows table as payload for the indexer. Adjust it for your own needs.

4.) Now we are setup to let the indexer loose.

CALL CTX_DDL.SYNC_INDEX(
  'CONCAT_DM_RV_IDX', -- your index name here
  '100M', -- memory count
  NULL, -- param for partitioned idxes
  2 -- parallel count
);

will start the indexing process on whatever count of rows you have inserted in step 3.) To run the next chunk repeat step 3.) with the next 50.000 or so rows ("where id between 50.000 and 100.000")

If you accidentally run the indexer on the same set of rows the index will strongly fragment. The only way to clean it up is to optimize the index with a REBUILD parameter. On our local machine that was extremely fast since the indexer doesn’t have to run but only rearranges the index tables' contents:

CALL CTX_DDL.OPTIMIZE_INDEX('CONCAT_DM_RV_IDX', 'REBUILD');

If you need some meta information about the indexing status and size you can ask the CTX_REPORT package:

SELECT CTX_REPORT.INDEX_SIZE('CONCAT_DM_RV_IDX') FROM DUAL;

And if you forgot which parameters you chose on indexing time:

SELECT * FROM CTXSYS.CTX_PARAMETERS;

Happy indexing!

Stefan 2010-07-28 07:51:54

ansaurus

tags:

views:

answers:

Speed up Oracle Text indexing or let the indexer work only on low load times

related questions