views:

137

answers:

2

Hello all,

Does anyone have any experience with receiving and updating a large volume of data, storing it, sorting it, and visualizing it very quickly?

Preferably, I'm looking for a .NET solution, but that may not be practical.

Now for the details...

I will receive roughly 1000 updates per second, some updates, some new rows of data records. But, it can also be very burst driven, with sometimes 5000 updates and new rows.

By the end of the day, I could have 4 to 5 million rows of data.

I have to both store them and also show the user updates in the UI. The UI allows the user to apply a number of filters to the data to just show what they want. I need to update all the records plus show the user these updates.

I have an visual update rate of 1 fps.

Anyone have any guidance or direction on this problem? I can't imagine I'm the first one to have to deal with something like this...

At first though, some sort of in memory database I would think, but will it be fast enough for querying for updates near the end of the day once I get a large enough data set? Or is that all dependent on smart indexing and queries?

Thanks in advance.

A: 

May be Oracle is more appropriate RDBMS solution fo you. The problem with your question is that at this "critical" levels there are too much variables and condition you need to deal with. Not only software, but hardware that you can have (It costs :)), connection speed, your expected common user system setup and more and more and more... Good Luck.

Tigran
Why Oracle over, say, SQL Server?
John Saunders
@John I said "may be", just looking at the very basic description of the problem and by my personal experience that brings me an idea that on huge databases in banks and reserch insitutes (were I worked for a while) are Oracle based and not MS SQL Server. Infact my answer wasn't pure technical, as it's very difficult to suggest something really practical on this question by just reading the post, by my opinion, Good luck.
Tigran
Ok, based on your response, I'm downvoting. That's not a good reason to suggest Oracle might be better. You don't have experience saying it's better; you have experience saying it's used.
John Saunders
Ok, no problem.
Tigran
I'm not really agree with you in downvoting, because my intention was to share my experience with the guy, because you know, in such organisations there are no casual choices, so may be he would like to look also in that direction. There was no any kind of intention to bring me a points, I don't really carry about point system, honestly. I think it's up to him to decide, if the answer is completely useless or not. Regards.
Tigran
+1  A: 

It's a very interesting and also challenging problem.

I would approach a pipeline design with processors implementing sorting, filtering, aggregation etc. The pipeline needs an async (threadsafe) input buffer that is processed in a timely manner (according to your 1fps req. under a second). If you can't do it, you need to queue the data somewhere, on disk or in memory depending on the nature of your problem.

Consequently, the UI needs to be implemented in a pull style rather than push, you only want to update it every second.

For datastore you have several options. Using a database is not a bad idea, since you need the data persisted (and I guess also queryable) anyway. If you are using an ORM, you may find NHibernate in combination with its superior second level cache a decent choice.

Many of the considerations might also be similar to those Ayende made when designing NHProf, a realtime profiler for NHibernate. He has written a series of posts about them on his blog.

Johannes Rudolph
I wouldn't recommend NHibernate for this , inserting/update-ing 1000-5000 rows / sec is not what NH is for. But the pipeline thing sound interesting.
sirrocco
you can alway combine it with custom sql bulk loading. NH has extensions points for (almost) everything.
Johannes Rudolph