views:

343

answers:

4

Hi everyone,

Yay, first post on SO! (Good work Jeff et al.)

We're trying to solve a bottleneck in one of our web-applications that was introduced when we started allowing users to generate reports on-demand.

Our infrastructure is as follows: 1 server acting as a Webserver/DBServer (ColdFusion 7 and MSSQL 2005)

It's serving a web-application for our backend users and a frontend website. The reports are generated by the users from the backend so there's a level of security where the users have to log in (web based).

During peak hours when reports are generated it brings the web-application and frontend website to unacceptable speed due to SQL Server using resources for the huge queries and afterward ColdFusion generating multi page PDFs.

We're not exactly sure what the best practice would be to remove some load, but restricting access to the reports isn't an option at the moment.

We've considered denormalizing data to other tables to simplify the most common queries, but that seems like it would just push the issue further.

So, we're thinking of getting a second server and use it as a "report server" with a replicated copy of our DB on which the queries would be ran. This would fix one issue, but the second remains: generating PDFs is resource intensive.

We would like to offload that task to the reporting server as well, but being in a secured web-application we can't just fire HTTP GET to create PDFs with the user logged in the web-application from server 1 and displaying it in the web-application but generating/fetching it on server 2 without validating the user's credential...

Anyone have experience with this? Thanks in advance Stack Overflow!!

+3  A: 

The most basic best practice is to not have the web server and db server on the same hardware. I'd start with that.

Al Everett
A: 

In addition to advice to separate web & db servers, I'd tried to:

a) move queries into stored procedures, if you're not using them yet;

b) generate reports by scheduler and keep them cached in special tables in ready-to-use state, so customers only select them with few fast queries -- this should also decrease report building time for customers.

Hope this helps.

Sergii
Hi and thanks for the help. The queries themselves are part of the problem but I would say 30% of the issue is generating the PDF on the fly on a production server. So I still wonder what would be best to offset that task to another server w/o compromising security of the web app.
jfrobishow
+3  A: 

"We would like to offload that task to the reporting server as well, but being in a secured web-application we can't just fire HTTP GET to create PDFs with the user logged in the web-application from server 1 and displaying it in the web-application but generating/fetching it on server 2 without validating the user's credential..."

why can't you? you're using the world's easiest language for writing webservices. here are my suggestions.

first, move the database to it's own server thus having cf and sql server on separate servers. the first reason to do this is performance. as already mentioned, having both cf and sql on the same server isn't an ideal setup. the second reason is for security. if someone is able to hack your webserver, well there right there to get your data. you should have a firewall in place between your cf and sql server to give you more security. last reason is for scalability. if you ever need to throw more resources or cluster your database, it's easier when it's on it's own server.

now for the webservices. what you can do is install cf on another server and writing webservices to handle the generation of reports. just lock down the new cf server to accept only ssl connections and pass the login credentials of the users to the webservice. inside your webservice, authenticate the user before invoking the methods to generate the report.

now for the pdfs themselves. one of the methods i've done in the pass is generating a hash based on some parameters passed (user credentials and the generated sql to run the query) and then once the pdf is generated, you assign the hash to the name of the pdf and save it on disk. now you have a simple caching system where you can look to see if the pdf already exists and if so, return it, otherwise generate it and cache it.

in closing, your problem is not something that most haven't seen before. you just need to do a little work and your application will magnitudes faster.

rip747
Thanks all for the reply. The Hash as a file name is a really good idea, the user generating the same report over and over is an issue. We now have a solid base to guide ourselves.
jfrobishow
+1  A: 

You have to separate the perception between generating the PDF and doing the calculations. Both are separate steps.

What you can do is

1) Create a report calculated table that will run daily and populated it with all the calculated values for all your reports.

2) When someone requests a PDF report, have the report do a simple select query of the pre-calculated values. It will be much less db effort than calculating on the fly. You can use coldfusion to generate the PDF if it's using the fancy pdf settings. Otherwise you may be able to get away with using the raw PDF format (it's similar to html markup) in text form, or use another library (cfx_pdf, a suitable java library, etc) to generate them.

If the users don't need to download and only need to view/print the report, could you get away with flash paper?

An alternative is also to build a report queue. Whether you put it on the second server or not, what CF could do if you can get away with it, you could put report requests into a queue, and email them to the users as they get processed.

You can then control the queue through a scheduled process to run as regularly as you like and do only create a few reports at a time. I'm not sure if it's a suitable approach for your situation.

As mentioned above, doing a stored procedure may also help, and make sure you have your indexes set correctly in MySQL. I once had a 3 minute query that I brought down to 15 seconds because I forgot to declare additional indexes in each table that were being heavily used.

Let us know how it goes!

Jas Panesar