views:

54

answers:

3

We have an application that takes real time data and inserts it into database. it is online for 4.5 hours a day. We insert data second by second in 17 tables. The user at any time may query any table for the latest second data and some record in the history...

Handling the feed and insertion is done using a C# console application...

Handling user requests is done through a WCF service...

We figured out that insertion is our bottleneck; most of the time is taken there. We invested a lot of time trying to finetune the tables and indecies yet the results were not satisfactory

Assuming that we have suffecient memory, what is the best practice to insert data into memory instead of having database. Currently we are using datatables that are updated and inserted every second A colleague of ours suggested another WCF service instead of database between the feed-handler and the WCF user-requests-handler. The WCF mid-layer is supposed to be TCP-based and it keeps the data in its own memory. One may say that the feed handler might deal with user-requests instead of having a middle layer between 2 processes, but we want to seperate things so if the feed-handler crashes we want to still be able to provide the user with the current records

We are limited in time, and we want to move everything to memory in short period. Is having a WCF in the middle of 2 processes a bad thing to do? I know that the requests add some overhead, but all of these 3 process(feed-handler, In memory database (WCF), user-request-handler(WCF) are going to be on the same machine and bandwidth will not be that much of an issue.

Please assist!

A: 

What kind of database are you using? MySQL has a storage engine MEMORY which would seem to be suited to this sort of thing.

Brian Hooper
We are using SQL database. Microsoft SQL server 2008
Mustafa A. Jabbar
A: 

Are you using DataTable with DataAdapter? If so, I would recommend that you drop them completely. Insert your records directly using DBCommand. When users request reports, read data using DataReader, or populate DataTable objects using DataTable.Load (IDataReader).

Storying data in memory has the risk of losing data in case of crashes or power failures.

Nick
We create an SQLTransaction connection and use it to insert by looping through the DataTable using SQLBulkCopy. We need the transaction for to reverse the effect of possible crashes
Mustafa A. Jabbar
+2  A: 

I would look into creating a cache of the data (such that you can also reduce database selects), and invalidate data in the cache once it has been written to the database. This way, you can batch up calls to do a larger insert instead of many smaller ones, but keep the data in-memory such that the readers can read it. Actually, if you know when the data goes stale, you can avoid reading the database entirely and use it just as a backing store - this way, database performance will only affect how large your cache gets.

Invalidating data in the cache will either be based on whether its written to the database or its gone stale, which ever comes last, not first.

The cache layer doesn't need to be complicated, however it should be multi-threaded to host the data and also save it in the background. This layer would sit just behind the WCF service, the connection medium, and the WCF service should be improved to contain the logic of the console app + the batching idea. Then the console app can just connect to WCF and throw results at it.

Update: the only other thing to say is invest in a profiler to see if you are introducing any performance issues in code that are being masked. Also, profile your database. You mention you need fast inserts and selects - unfortunately, they usually trade-off against each other...

Adam
Thanks for the answer. The data doesn't really go stale as each record is subject for being selected at anytime(the user could specify the second "record" he/she wants). Your solution still suggests inserting into database although less frequently by creating a cache, but I wonder if we are going to still select from the database (if data not in cache which is possible given that selections are totally random for history data) while by coincedent a huge table is being inserted, isn't this going to create selection time outs? Can't we just stack everything up into memory?
Mustafa A. Jabbar
That really depends - inserts every second for 4.5 hours might amount to a lot of data? Your problem is insert performance - which can be addressed in other ways. What you could do is remove all indexes thus tailoring the table for insert performance and incur the cost on the selects, perhaps use a select cache to hold more of that in memory.
Adam
Of course, in-memory is a feasible way to go, but memory is also volatile - power cut and its all gone.
Adam
Removing the indexes offered us better performance, as well as removing the primary keys. Problem here is that we want fast insertion AND selection. The expections of performance in our stock market field are quite high
Mustafa A. Jabbar
I would suggest a half-way approach then - try a distributed cache system (memcache, Velocity) such that it gives you fast access over network, but node-like behaviour in that one item in the cache is copied across more than one server, pretty much entirely removing the risk of losing live data. Still save this into the database, but put all data in the cache also - the only metric left to measure then is time from cache-to-db and how much data in the cache is unsaved at any one time.
Adam