views:

123

answers:

3

I have a trade table with several million rows. Each row represents the version of a trade. If I'm given a possibly new trade I compare it to the latest version in the trade table. If it has changed I add a new version, otherwise I do nothing. In order to compare the 2 trades I read the version from the trade table into my application.

This doesn't work well when I'm given 10s of thousands of possibly new trades. Even batching reads to read in a 1000 trades at once and compare them the whole process can take several minutes. All the time is spent in the DB.

I'm trying to find a way to compare the possibly new trades to the ones in the trade table without so much I/O. What I've come up with so far is adding a hash column to each row in the trade table. The hash is of all the trade fields. Then when I'm given possibly new trades I compute their hash, put the values into a temporary table, then find ones that are different. This feels very hacky. Is there a better way of doing it?

Thanks

-- Some more info

SQL Server 2008

Trade(rowid, tradeid, type, trader, volume, etc..) -- rowid is unique, tradeid will be duplicated for difference versions of the same trade

The table has about 30 columns and is not normalised, so depending on type some columns can be null. Someone posts thousands of trades to a java servlet which is then supposed to add a new row for any trade that has changed. Unfortunately in order to do this the java servlet has to read in every one of the thousands of trades and compare them.

The newest version of a particuluar trade is just the version with the highest rowid.

A: 

Hash not a bad solution. It will help if you post some more info about the table structure.

Standard way to do it is to simply run UPDATE statement, WHERE clause will include joins on key fields: WHERE table.PRODUCT_ID = NEWTRADE.PRODUCT_ID; also, check the value fields: WHERE table.TRADE_AMOUNT <> newtrade.BIDAMOUONT; if you index the table by PRODUCT_ID - it will run milliseconds.

You may insert your 10s of thousands new trades in a table first and then run UPDATE to join main table with new trades. again, make sure you have indexes the tables properly.

IMHO
A: 

Given what you have told us, it sounds like you are in part looking for a way to determine if the row changed. This is a good candidate for a rowversion column (previously known as a timestamp). This column will change whenever any value in the row changes. Thus, you could compare the last trade's rowversion with the current rowversion to determine if they were different.

It might be possible to do this in a single insert statement if you show us some additional details about the table schema and specifically how "last" is determined and how you match rows in the two tables (i.e. the matching key between the two tables).

Thomas
+1  A: 

If you are using SQL Server 2008, you might want to use the MERGE statement.

Create an index on the columns that uniquely identify each trade.

Mitch Wheat
The merge statement is pretty cool, I'd never seen it before. Unfortunately in order to use it I'd have to insert all the possibly new trades into a temporary table which would take quite a while.
Dave
@Dave: why would you need a temp table?
Mitch Wheat
@Mitch I guess what I mean is that in order to use the merge statement I would first have to have the possibly new trades in, or available to, sql server. However transferring that much data, when most of it isn't new, would take too long. I would prefer to be able to tell what's new with the minimum amount of data transfer and DB lookups.
Dave