views:

61

answers:

1

I have an application that talks to several internal and external sources using SOAP, REST services or just using database stored procedures. Obviously, performance and stability is a major issue that I am dealing with. Even when the endpoints are performing at their best, for large sets of data, I easily see calls that take 10s of seconds.

So, I am trying to improve the performance of my application by prefetching the data and storing locally - so that at least the read operations are fast.

While my application is the major consumer and producer of data, some of the data can change from outside my application too that I have no control over. If I using caching, I would never know when to invalidate the cache when such data changes from outside my application.

So I think my only option is to have a job scheduler running that consistently updates the database. I could prioritize the users based on how often they login and use the application.

I am talking about 50 thousand users, and at least 10 endpoints that are terribly slow and can sometimes take a minute for a single call. Would something like Quartz give me the scale I need? And how would I get around the schedular becoming a single point of failure?

I am just looking for something that doesn't require high maintenance, and speeds at least some of the lesser complicated subsystems - if not most. Any suggestions?

A: 

This does sound like you might need a data warehouse. You would update the data warehouse from the various sources, on whatever schedule was necessary. However, all the read-only transactions would come from the data warehouse, and would not require immediate calls to the various external sources.

This assumes you don't need realtime access to the most up to date data. Even if you needed data accurate to within the past hour from a particular source, that only means you would need to update from that source every hour.

You haven't said what platforms you're using. If you were using SQL Server 2005 or later, I would recommend SQL Server Integration Services (SSIS) for updating the data warehouse. It's made for just this sort of thing.

Of course, depending on your platform choices, there may be alternatives that are more appropriate.


Here are some resources on SSIS and data warehouses. I know you've stated you will not be using Microsoft products. I include these links as a point of reference: these are the products I was talking about above.

John Saunders
Sounds like it can do what I am looking for, but Microsoft products is not an option. Would an ESB like ServiceMix or even RabbitMQ be any appropriate?
Langali
I'm not familiar with those. I do recommend that you learn the kinds of things that SSIS can do - those are the features you should be looking for, in part. Some of your ongoing issues will be of getting at the data and cleaning it up. SSIS is an ETL tool that can do that. You could then have your 50,000 users would retrieve their data from the DW, optimized for read and query access, updated as frequently as required.
John Saunders