views:

60

answers:

5

Hello!

Here is my situation:

1) User A must monitor the results of an sql query Q. 2) User B can modify data that may change the results of Q 3) running Q on the entire database is slow

What I would like have is: whenever user B modifies the data: with a create, update or delete, then generate the delta to the results of Q. i.e. return (list of modified, list of created, list of deleted) triple.

So, updates to Q would be much faster.

Implementation details: SQL Server 2005, with NHibernate layer. Currently, user A runs Q in polling mode, every 10 seconds.

One idea I had was to create an in-memory copy of the DB schema, and when user B makes a change, then also make this change in the in-memory copy, and then run Q on this in-memory copy.

Is there a better way of doing this?

Thanks!!!

Post-Script : Thanks for all of the detailed comments. I've decided to implement this feature as follows: 1) use NHibernate PostUpdate event to get notification when a change has been made to the database 2) run LINQ to Object query on in-memory collection of updates 3) for each SQL query registered in the system, store a list of results on a Redis server, in memory (no persistence to disk) 4) if LINQ query indicates that the update is relevant to a given query, i.e. changes the result set, then update the list on Redis 5) use some type of notification, such as Redis Pub/Sub, to notify users of changes to a given query 6) when updating the list on Redis, pass in the sql server update timestamp. Will need a little code on Redis side to resolve conflicts, based on timestamp. If a change is passed in with a lower timestamp, ignore.

A: 

One approach would be to optimize the polling query. Make sure there's a column in the table that tracks when it was last updated. Then create an index that starts with UpdateDt. Then, the polling query can do:

select  *
from    TheBigTable
where   UpdateDt > 'LastTimeIChecked'

This will be very fast because there's an index on UpdateDt. And it's still a fairly simple code/setup.

P.S. If it matters which user made the change, you can also add UpdatedBy to the table and the index.

Andomar
A: 

how often does user B update the data? It may well be much better not to let A poll for changes, but to have B cause some event signalling to A that the data has changes. À can simply wait for this event. The event can hold any data you want. Perhaps the very changes B makes to the data. This saves the load that polling causes, and the delay that A experiences is minimal this way.

H. den Breejen
+1  A: 

Ok, there is NO Way to do that without hitting the server with a lot of overhad / polling queries of some sort. SImply because SQL is not made as what people like me know a "ticker plant". It is not there to distribute real time updates. SQL is pretty much "ask and you get told". So "get me real time updates" means asking "are we there yet? are we there yet? are we there yet?" all the time, like the donkey in Shrek 2. SQL Server is a bette rShrek, though, and will gladly answer all the time with "not yet", but it WILL cause load - or cause delays.

Most people doing what you do do it OUTSIDE THE DATABASE. Basically run through an application server that processes the updates and while writing them to the database, distributes the update information. Some do that with quite some performance - I have one feed I get from a database that handles about 250.000 updates per second IN QUIET TIMES - peaks are about 600.000 updates per second. All these are updates to a known number of rows (about 1.5 million). The udpates are distributed world wide and cause no database load - because the distribution is done OUTSIDE the database in the applicaiton layer. In real time. Oh, and "minimal delay" here is "as fast as possible".

There is NO way to distribute results like you want FAST and EFFICIENT with SQL Server. You always end up with the query semantics that get into your way one way or the other.

The answers provided are still correct - but only feasible in small volume scenarios or scenarios where significant delays (0.5 seconds upward) are feasible (sorry, real time in my world is less than 5ms compelte processing for SLOW systems).

What you look for is a standard problem that many people have... with financial data. Real time trading systems need information FAST and REAL TIME and pulling a database is not feasible. If you want a scalable solution (or need one), look in that area for approaches.

TomTom
+1  A: 

In SQL Server 2005, they introduced the Service Broker. You can use this as a queue. You send a message from a trigger (by adding a record to a table) and another process calls a stored procedure that is set up to wait for a message in the queue (a record in the table). Until this message is added, the stored procedure will sit there any wait (I think there is a timeout setting - but I think there is also an infinite option).

In this blog post, the poster shows an example of using the Service Broker from a trigger to improve performance.

Gabriel McAdams
A: 

You can use something like a recent list of changes beside the sql database. The list of changes is something similar to an rss feed. When a rss-reader downloads a feedurl for a blog, the feed loads all articles the first time, and only the new ones after that. The rss-feed for the application described will be a little bit different than the original rss-feed, because it will not only contain new articles, but also changes to articles and removed articles. To optimize performance, the rss-feed can be read at an interval (seconds or milliseconds), and written at an interval (update file after batchsize of changes or time interval). Using a sql query for the first display, and poll the rss-feed after that is probably needed for an acceptable performance when a lot of data is displayed at once in the UI. The number of changes in the feed can be limited in the maximum number of changes in the time of reading and updating the UI with the sql-query and the rss-feed. The rss-feed can be implemented with a message queue or esb, but you can use different ways as well. The data in the feed can be xml, json, to binary serialized object or whatever what is the easiest to use in the UI. The feed can be stored in some kind of cache, the Ram memory directly or a static file. You might have to add a version-number column to the database and increase that number on every change to make it possible to decide what is read from the database on the first load and which changes need to be processed from the rss-feed.

Reading:

  1. Use Sql-query to load initial data
  2. Wait for time interval to elapse
  3. Read rss-feed with changes
  4. Update UI with new changes
  5. Goto 2

Writing

  1. Begin transaction
  2. Write change to sql database
  3. Write change to rss-feed
  4. Commit transaction

It might be acceptable to put step 3 of writing on a message queue and process the write asynchronous.

A main issue with displaying changes "real-time" is that real-time does not mean zero seconds, but "the time the user can interpret the change", which is something between 0.05 seconds for a multi-player first person shooter game and 30 minutes or more for a report requested by a manager, printed out by the secretary, and read by the manager after a few phone calls.

Paco