I have a trade table with several million rows. Each row represents the version of a trade. If I'm given a possibly new trade I compare it to the latest version in the trade table. If it has changed I add a new version, otherwise I do nothing. In order to compare the 2 trades I read the version from the trade table into my application.
This doesn't work well when I'm given 10s of thousands of possibly new trades. Even batching reads to read in a 1000 trades at once and compare them the whole process can take several minutes. All the time is spent in the DB.
I'm trying to find a way to compare the possibly new trades to the ones in the trade table without so much I/O. What I've come up with so far is adding a hash column to each row in the trade table. The hash is of all the trade fields. Then when I'm given possibly new trades I compute their hash, put the values into a temporary table, then find ones that are different. This feels very hacky. Is there a better way of doing it?
Thanks
-- Some more info
SQL Server 2008
Trade(rowid, tradeid, type, trader, volume, etc..)
-- rowid
is unique, tradeid
will be duplicated for difference versions of the same trade
The table has about 30 columns and is not normalised, so depending on type
some columns can be null
. Someone posts thousands of trades to a java servlet which is then supposed to add a new row for any trade that has changed. Unfortunately in order to do this the java servlet has to read in every one of the thousands of trades and compare them.
The newest version of a particuluar trade is just the version with the highest rowid.