We have a system that is concurrently inserted a large amount of data from multiple stations while also exposing a data querying interface. The schema looks something like this (sorry about the poor formatting):
[SyncTable]
SyncID
StationID
MeasuringTime
[DataTypeTable]
TypeID
TypeName
[DataTable]
SyncID
TypeID
DataColumns...
Data insertion is done in a "Synchronization" and goes like this (we only insert data into the system, we never update)
INSERT INTO SyncTable(StationID, MeasuringTime) VALUES (X,Y); SELECT @@IDENTITY
INSERT INTO DataTable(SyncID, TypeID, DataColumns) VALUES
(SyncIDJustInserted, InMemoryCachedTypeID, Data)
... lots (500) similar inserts into DataTable ...
And queries goes like this ( for a given station, measuringtime and datatype)
SELECT SyncID FROM SyncTable WHERE StationID = @StationID
AND MeasuringTime = @MeasuringTime
SELECT DataColumns FROM DataTable WHERE SyncID = @SyncIDJustSelected
AND DataTypeID = @TypeID
My question is how can we combine the transaction level on the inserts and NOLOCK/READPAST hints on the queries so that:
- We maximize the concurrency in our system while favoring the inserts (we need to store a lot of data, something as high as 2000+ records a second)
- Queries only return data from "commited" synchronization (we don't want a result set with a half inserted synchronization or a synchronization with some skipped entries due to lock-skipping)
- We don't care if the "newest" data is included in the query, we care more for consistency and responsiveness then for "live" and up-to-date data
This may be very conflicting goals and may require a high transaction isolation level but I am interested in all tricks and optimizations to achieve high responsiveness on both inserts and selects. I'll be happy to elaborate if more details are needed to flush out more tweaks and tricks.
UPDATE: Just adding a bit more information for future replies. We are running SQL Server 2005 (2008 within six months probably) on a SAN network with 5+ TB of storage initially. I'm not sure what kind of RAID the SAn is set up to and precisely how many disks we have available.