I've got a reasonably simple query (this time) that I need ALL the results back from (I'm storing them in an excel spreadsheet). The query itself times out the server, so how do I go about running it without that happening?
Easiest way would be to break the domain of the query into several parts. For instance, add to the WHERE clause an expression that selects only the first half of the key range, then run a 2nd query to select the bottom half. Then merge the output.
You can set the time out on a per request basis although merging multiple queries may be a better approach.
<cfsetting
enableCFoutputOnly = "yes|no"
requestTimeOut = "value in seconds"
showDebugOutput = "yes|no" >
You can increase the request timeout for the page:
<cfsetting requestTimeout="3600" />
This will make sure that you have time to process all of the entries.
You may also want to break the list up into "chunks". It will require some tuning to find out what the optimum chunk size is, but you could grab the results 100 or 1000 lines at a time, and use <cfflush> to push the results out to the screen as they become available. This approach also has the advantage of using less memory on the coldFusion server, as every row pulled back from the SQL server gets loaded into CFML memory, and sits there until the query object variable is overwritten or goes out of scope (at the end of the page). This means you can easily fill up coldFusion memory reading several hundred thousand rows, especially if the rows are "wide" (i.e. containing big varchars or texts).
As others have pointed out, you could try increasing the request timeout of the page, although this isn't advisable if your query's execution is measured in minutes, rather than seconds or milliseconds. CF will only service a set number of requests at a time, so you want to be careful about locking up one of those requests waiting on a 5-minute query to complete.
If you're using SQL Server or Oracle, I think CFQUERY exposes its own per-query timeout attribute that you can set. Again, this isn't advisable for really long running queries.
In my experience, if your query is either so complex, or returns so much data, that it takes minutes to run, then it's time to decouple the execution of the query from the request that initiates it. There are a number of ways you could do this, such as:
Create some sort of queuing system to record pending service requests. This could be a DB table, an XML file on disk, etc. When your user requests their data you register that request with this queue.
Write a scheduled task (e.g. Java, DTS, or a scheduled CF page) that periodically checks this queue for work. Depending on your needs you might spin off a background thread to handle each request, or maybe the scheduled task handles it directly. If you're using scheduled CF pages, you'll want to break the total workload up into smaller chunks that can be handled iteratively, otherwise you'll have the same problem you have now.
Once the scheduled task determines that a request has been filled, it kicks off some sort of notification that processing is ready. For instance, you might email the user to tell them the data is ready, with a link to download a .csv file that was created on disk.
Obviously, the right choice depends a lot on the specific problem being solved. In general I'd try these things, in this order:
- Aggressively attack the query execution time. Can you use indexes or write better T-SQL?
- If the query takes a minute or two, and is run very infrequently, increasing page or query timeouts might be acceptable.
- If the query is run often, or takes more than 2-3 minutes, bite the bullet and build a batching or queuing system to handle the query in the background.
I would throw the query into a separate thread, loading it into a persistent scope (eg session). Forward to a page that checks for the existence of the query. Repeat the check until the query exists, then forward to a page that displays/processes/whatevers it.
first off i would check to see why this query is taking so long.
what can you do at the database level to improve the performance of the query. sounds like maybe you don't have the database indexed properly. take the query and throw it into some program that you can analyze the execution plan. look for the lags and address them.
to get more performance, look into creating indexed views if your database supports that sort of thing.
next look at caching some parts of the query out. there is no reason to be doing calculations on historical data for every request when it could be done once and then cached in a table somewhere.
as for the coldfusion end. make sure that you're using java.io.BufferedWriter to create the spreadsheet. using a normal string concatenation method in CF is dog slow and BufferedWriter is infinitely faster. Attached is a CFC I created for creating tabed delimited spreadsheets, you can modify it to suit your needs.
<!--- init --->
<cffunction name="init" access="public" returntype="Any" output="false">
<cfargument name="name" type="string" required="true">
<cfset var local = {}>
<!--- name of file when downloading --->
<cfset variables.name = arguments.name & ".xls">
<!--- name of temp file --->
<cfset variables.filename = CreateUUID() & ".csv">
<!--- full path to temp file for downloading --->
<cfset variables.fullfilename = expandpath("/_temp") & "\" & variables.filename>
<!--- file write java object --->
<cfset variables.filewriter = CreateObject("java","java.io.FileWriter").init(
variables.fullfilename
,JavaCast("boolean","true")
)>
<!--- buffered writer java object --->
<cfset variables.bufferedwriter = CreateObject("java","java.io.BufferedWriter").init(
variables.filewriter
)>
<!--- row delimeter --->
<cfset variables.row = chr(10)>
<!--- col delimeter --->
<cfset variables.col = chr(9)>
<!--- header container --->
<cfset variables.headers = []>
<!--- data container --->
<cfset variables.data = []>
<cfset newrow()>
<cfreturn this>
</cffunction>
<!--- addheader --->
<cffunction name="addheader" access="public" returntype="void" output="false">
<cfargument name="str" type="string" required="true">
<cfset arrayappend(variables.headers, arguments.str)>
</cffunction>
<!--- newrow --->
<cffunction name="newrow" access="public" returntype="void" output="false">
<cfset arrayappend(variables.data, arraynew(1))>
<cfset variables.data_counter = arraylen(variables.data)>
</cffunction>
<!--- adddata --->
<cffunction name="adddata" access="public" returntype="void" output="false">
<cfargument name="str" type="string" required="true">
<cfset arrayappend(variables.data[variables.data_counter], arguments.str)>
</cffunction>
<!--- flush --->
<cffunction name="flush" access="public" returntype="void" output="false">
<cfset var local = {}>
<!--- write headers --->
<cfset local.counter = 0>
<cfset local.headers_count = arraylen(variables.headers)>
<cfloop array="#variables.headers#" index="local.header">
<cfset local.counter++>
<cfset variables.bufferedwriter.write(local.header & variables.col)>
</cfloop>
<cfif not arrayisempty(variables.headers)>
<cfset variables.bufferedwriter.write(variables.row)>
</cfif>
<!--- write data --->
<cfloop array="#variables.data#" index="local.data">
<cfloop array="#local.data#" index="local.cell">
<cfset variables.bufferedwriter.write(local.cell & variables.col)>
</cfloop>
<cfset variables.bufferedwriter.write(variables.row)>
</cfloop>
<cfset variables.bufferedwriter.close()>
<cfsetting showdebugoutput="No">
<cfheader name="Content-Description" value="File Transfer">
<cfheader name="Content-Disposition" value="attachment;filename=#variables.name#">
<cfcontent type="application/vnd.ms-excel" file="#variables.fullfilename#" deletefile="true" reset="true">
</cffunction>
Use indexing properly. USe/apply/create foreign keys where ever you can. Querries will never time out for a db which is normalized. Be very caseful with joins and clauses like if you have group by cluase in ur query , there instead of using where clause, having clause will work faster and thus reduce query execution time.
Use cost estimation to check which table is taking more time / needs normalization in ur db.