views:

101

answers:

4

Our product takes tests of some 350 candidates at the same time. At the end of the test, results for each candidate are moved to a datawarehouse full of indexes on it. For each test there's some 400 records to be entered in datawarehouse. So 400 x 350 is a lot of records. If there are not much records in the datawarehouse, all goes well. But if there are already lots of records in the datawarehouse, then a lot of inserts fail...

Is there a way to have indexes that are only rebuild at the end of the day or isn't that the real problem? Or how would you solve this?

+2  A: 

It is common in data-warehousing do drop indexes and constraints before loading, and re-create them after. If you get rid of constraints (FKs), make sure that your loading process takes care of this. Drop any check constraints too, and move check validations into ETL software,

Damir Sudarevic
+1  A: 

I've worked with both normalized and Kimball star data warehouses and this doesn't sound like a problem you should be running into. I would say 140000 rows is not a lot of rows even in a small data warehouse.

Why do the inserts fail? Typically in a Kimball-style warehouse, no inserts ever fail - for instance in a fact table, inserts always have a unique set of primary keys related to the dimensions and the grain (like a date or time snapshot). In a dimmension table, changes are detected, new dimensions are inserted, existing ones are re-used. In a normalized warehouse, you usually have some kind of revision mechanism or archive process or effective date which keeps things unique.

It seems to me that regardless of your DW philosophy or architecture, there should be something keeping these rows unique.

If (as you stated in your comments) you have a single index containing every column, that's probably not a very useful index (in any database design). Are you sure your index is even being used for any queries? Is it also marked to be unique and is that constraint being violated? In any case, that's a pretty large multi-column index, and it's going to be relatively expensive to compare against - this could result in a timeout - you can always fix that in your connection to wait forever, but I would attack the problem from a design perspective.

Cade Roux
+2  A: 

140K is NOT a lot of rows. Please post your table design and the error that you get when the inserts fail

SergeyKazachenko
+1  A: 

I would suggest the following: Keep all you data, except of today's in the separate table (lets call it History), where indexed are tuned for your reports. Keep today's data in another separate table, (Lets call it Today) and run a job in the midnight to move data from Today table to History table. In the Today table - you should have minimal indexing to improve insert performance. By implementing this design you will be sure that you reports are not congesting with inserts. In addition - you have two table tuned for their purposes. In general it is hard to tune table for both a fast inserts and a fast selects.

David Gruzman