OK. I've read things here and there about SQL Server heaps, but nothing too definitive to really guide me. I am going to try to measure performance, but was hoping for some guidance on what I should be looking into. This is SQL Server 2008 Enterprise. Here are the tables:
Jobs
- JobID (PK, GUID, externally generated)
- StartDate (datetime2)
- AccountId
- Several more accounting fields, mainly decimals and bigints
JobSteps
- JobStepID (PK, GUID, externally generated)
- JobID FK
- StartDate
- Several more accounting fields, mainly decimals and bigints
Usage: Lots of inserts (hundreds/sec), usually 1 JobStep per Job. Estimate perhaps 100-200M rows per month. No updates at all, and the only deletes are from archiving data older than 3 months.
Do ~10 queries/sec against the data. Some join JobSteps to Jobs, some just look at Jobs. Almost all queries will range on StartDate, most of them include AccountId and some of the other accounting fields (we have indexes on them). Queries are pretty simple - the largest part of the execution plans is the join for JobSteps.
The priority is the insert performance. Some lag (5 minutes or so) is tolerable for data to appear in the queries, so replicating to other servers and running queries off them is certainly allowable.
Lookup based on the GUIDs is very rare, apart from joining JobSteps to Jobs.
Current Setup: No clustered index. The only one that seems like a candidate is StartDate. But, it doesn't increase perfectly. Jobs can be inserted anywhere in a 3 hour window after their StartDate. That could mean a million rows are inserted in an order that is not final.
Data size for a 1 Job + 1 JobStepId, with my current indexes, is about 500 bytes.
Questions:
Is this a good use of a heap?
What's the effect of clustering on StartDate, when it's pretty much non-sequential for ~2 hours/1 million rows? My guess is the constant re-ordering would kill insert perf.
Should I just add bigint PKs just to have smaller, always increasing keys? (I'd still need the guids for lookups.)
I read GUIDs as PRIMARY KEYs and/or the clustering key, and it seemed to suggest that even inventing a key will save considerable space on other indexes. Also some resources suggest that heaps have some sort of perf issues in general, but I'm not sure if that still applies in SQL 2008.
And again, yes, I'm going to try to perf test and measure. I'm just trying to get some guidance or links to other articles so I can make a more informed decision on what paths to consider.