Is a CLUSTER INDEX desireable when loading a sorted loadfile into a new table?

INFORMIX-SE:

My users periodically run an SQL script [REORG.SQL] which unloads all rows from a table in sorted order to two separate files (actives and inactives), drops the table, re-creates the table, loads the sorted loadfiles back into it, creates a cluster index on the same column I sorted my unload files by, creates other supporting indexes and updates its statistics.

(See REORG.SQL script at: http://stackoverflow.com/questions/3393818/se-bcheck-y-anomaly)

(Also see: http://stackoverflow.com/questions/3066714/customer-pk-name-joining-transactions-fk-name-vs-customer-pk-id-serial-joini for reason why cluster index is by name and not pk_id[serial]=fk_id[int])

With my REORG.SQL script, I've been having index file consistency problems so I suspected the CLUSTER INDEX had something to do with it and created the index with no clustering and the problems went away!

Now my question is: If I manage to load all my transaction data, sorted by the customers full name into a newly created table, is it really necessary for me to create a CLUSTER INDEX when in fact the rows are already sorted in the same order that the clustering would accomplish?.. I know that a clustered index starts loosing its clustering as new rows are added, so what's the advantage of creating a cluster index?.. does the query optimizer take advantage of clustering vs. a non-clustered index when the rows are essentially in the same clustered order?.. Has anyone encountered IDX/DAT file problems when clustering a table?.. Perhaps my SQL script has something wrong with it? (PLEASE REVIEW MY SQL SCRIPT CODE TO SEE IF I'm DOING SOMETHING WRONG?)

The script unloads the active and inactive transactions to two different files, with each file sorted by customer name. It then loads them back into the table, active transactions first, followed by inactive transactions. A clustered index is then created on customer name. The problem is that the database now has to go back and re-order the physical rows by customer name when building the clustered index. Although each of the unload files are separately ordered by customer name, when the two are put together the result is not ordered by customer name, causing more work for the database. Unless the separate files for active and inactive transactions are needed elsewhere you might try just dumping all the transactions to a single file, ordered by customer name, and then re-load the table from that single file. At that point the data in the table would be ordered by customer name and the clustered index create wouldn't have to do the re-ordering of the data.

As to whether or not the clustered index is really needed - a clustered index can be of value if you use that column to query with as it should help to reduce the number of I/O's needed to fetch the data. Usually clustered indexes are created on columns which increase monotonically so perhaps TRX_NUM would serve well as the column to be named on the clustered index.

Share and enjoy.

+1 - Clustered indexes CAN suffer a performance hit when you are inserting unordered rows into the table, but normally you make up for it if you cluster on the right field. Identity columns are the classic example for clustering.

JNK 2010-08-05 12:36:30

@Bob: You're correct about the two load files not physically loading each customer's transaction rows grouped together, that's why I previously created the clustered index, but experiencing problems with the IDX file. I chose to cluster by the customers full name because I'm using I-SQL Perform screens and when clerks wildcard query by customer name in the master, all transactions belonging to each customer are automatically queried and response time is faster. the join is by name, not pk_id[serial]=fk_id[int], see: http://stackoverflow.com/questions/3066714

Frank Computer 2010-08-05 14:46:49

@JNK: What kind of performance hit?.. So if I manage to load all transaction rows, ordered by cust_fullname, it's essentially the same thing that the clustering would accomplish and that order is not maintained once new transactions are added, so is it better to just create a non-clustered index on the name since the .DAT is already physically ordered by name?

Frank Computer 2010-08-05 14:53:30

IF you have a clustered index and insert data already in that order, it won't really matter. But when you add new data, it needs to physically resort the records so it takes a little longer. Honestly for your situation you may need to just experiment.

JNK 2010-08-05 15:48:06

@JNK: With Informix-SE, once a clustered index is created, when you add a new row it doesn't get physically placed in the proper order, it's appended at the end of the .DAT file. Perhaps in IDS, Oracle or another RDBMS the clustering is maintained.

Frank Computer 2010-08-05 16:18:04

@Frank - my mistake, my experience is with SQL Server

JNK 2010-08-05 16:51:15

ansaurus

tags:

views:

answers:

Is a CLUSTER INDEX desireable when loading a sorted loadfile into a new table?

related questions