ansaurus

Question

How to run a very large query (SQL Server and ColdFusion)

Answer 1

+2 A:

Easiest way would be to break the domain of the query into several parts. For instance, add to the WHERE clause an expression that selects only the first half of the key range, then run a 2nd query to select the bottom half. Then merge the output.

Steve 2008-10-22 18:10:45

So if I threw it into a loop, selected say, 1000 records at a time, merged the results over and over until the end was reached, it wouldn't time out on me?

Organiccat 2008-10-22 18:13:54

Assuming that you're issuing the query from outside of the database environment (i.e., an application or script), then it will see each of the individual queries as separate and wouldn't "lump them together" for the purposes of detecting a timeout.

Steve 2008-10-22 18:34:35

Ok, one last question, since SQL Server doesn't have a limit, how would you go about splitting them into chunks of 1000 records at a time?

Organiccat 2008-10-22 18:56:15

You can simulate MySQL's limit in SQL2005, but it's more difficult:WITH EMPLOYEE_ROWS AS(SELECT ROW_NUMBER() OVER (ORDER BY EMPID ASC) AS ROWID, * FROM EMPLOYEE)SELECT * FROM EMPLOYEE_ROWS WHERE ROWID between #begin# and #end#

Adam N 2008-10-22 19:00:42

For the query that runs once a year, just increase the timeout on your SQL connection (I am sure ColdFusion has facilities for this). It's not the SQL server that times out, it's CF script that does not wait long enough. With larger timeout you will not need this split/merge business.

2008-10-22 19:33:39

The way you explained it works, just that the query is still timing out when it tries to retrieve rows that aren't there (more than the number of existing rows). Any solution to that?

Organiccat 2008-10-22 20:34:14

Answer 2

+1 A:

You can set the time out on a per request basis although merging multiple queries may be a better approach.

<cfsetting 
enableCFoutputOnly = "yes|no" 
requestTimeOut = "value in seconds"
showDebugOutput = "yes|no" >

Nick 2008-10-22 18:18:38

Answer 3

+5 A:

You can increase the request timeout for the page:

<cfsetting requestTimeout="3600" />

This will make sure that you have time to process all of the entries.

You may also want to break the list up into "chunks". It will require some tuning to find out what the optimum chunk size is, but you could grab the results 100 or 1000 lines at a time, and use <cfflush> to push the results out to the screen as they become available. This approach also has the advantage of using less memory on the coldFusion server, as every row pulled back from the SQL server gets loaded into CFML memory, and sits there until the query object variable is overwritten or goes out of scope (at the end of the page). This means you can easily fill up coldFusion memory reading several hundred thousand rows, especially if the rows are "wide" (i.e. containing big varchars or texts).

Adam N 2008-10-22 18:18:40

Ok, so if I kept storing and merging the results in CF variables, would this automatically take care of the "flush" process since I'm not printing to screen anyway? (storing in excel after finished)

Organiccat 2008-10-22 18:29:57

As long as you keep calling the queries the same thing, it'll destroy the old reference and mark it ready for garbage collection. This is roughly equivalent to clearing it from memory.

Adam N 2008-10-22 18:43:19

Answer 4

+1 A:

As others have pointed out, you could try increasing the request timeout of the page, although this isn't advisable if your query's execution is measured in minutes, rather than seconds or milliseconds. CF will only service a set number of requests at a time, so you want to be careful about locking up one of those requests waiting on a 5-minute query to complete.

If you're using SQL Server or Oracle, I think CFQUERY exposes its own per-query timeout attribute that you can set. Again, this isn't advisable for really long running queries.

In my experience, if your query is either so complex, or returns so much data, that it takes minutes to run, then it's time to decouple the execution of the query from the request that initiates it. There are a number of ways you could do this, such as:

Create some sort of queuing system to record pending service requests. This could be a DB table, an XML file on disk, etc. When your user requests their data you register that request with this queue.
Write a scheduled task (e.g. Java, DTS, or a scheduled CF page) that periodically checks this queue for work. Depending on your needs you might spin off a background thread to handle each request, or maybe the scheduled task handles it directly. If you're using scheduled CF pages, you'll want to break the total workload up into smaller chunks that can be handled iteratively, otherwise you'll have the same problem you have now.
Once the scheduled task determines that a request has been filled, it kicks off some sort of notification that processing is ready. For instance, you might email the user to tell them the data is ready, with a link to download a .csv file that was created on disk.

Obviously, the right choice depends a lot on the specific problem being solved. In general I'd try these things, in this order:

Aggressively attack the query execution time. Can you use indexes or write better T-SQL?
If the query takes a minute or two, and is run very infrequently, increasing page or query timeouts might be acceptable.
If the query is run often, or takes more than 2-3 minutes, bite the bullet and build a batching or queuing system to handle the query in the background.

Seth Petry-Johnson 2008-10-22 18:59:57

Thanks for the response! The query will be run VERY infrequently. Once a year about. It generates an excel spreadsheet with a bunch of data for our customer and then they import that data back in. I'm thinking of going with the loop query as it best suits my needs for now :)

Organiccat 2008-10-22 19:21:23

Answer 5

A:

I would throw the query into a separate thread, loading it into a persistent scope (eg session). Forward to a page that checks for the existence of the query. Repeat the check until the query exists, then forward to a page that displays/processes/whatevers it.

Ben Doom 2008-10-23 02:29:01

Answer 6

+3 A:

first off i would check to see why this query is taking so long.

what can you do at the database level to improve the performance of the query. sounds like maybe you don't have the database indexed properly. take the query and throw it into some program that you can analyze the execution plan. look for the lags and address them.

to get more performance, look into creating indexed views if your database supports that sort of thing.

next look at caching some parts of the query out. there is no reason to be doing calculations on historical data for every request when it could be done once and then cached in a table somewhere.

as for the coldfusion end. make sure that you're using java.io.BufferedWriter to create the spreadsheet. using a normal string concatenation method in CF is dog slow and BufferedWriter is infinitely faster. Attached is a CFC I created for creating tabed delimited spreadsheets, you can modify it to suit your needs.

<!--- init --->
<cffunction name="init" access="public" returntype="Any" output="false">
 <cfargument name="name" type="string" required="true">
 <cfset var local = {}>

 <!--- name of file when downloading --->
 <cfset variables.name = arguments.name & ".xls">
 <!--- name of temp file --->
 <cfset variables.filename = CreateUUID() & ".csv">
 <!--- full path to temp file for downloading --->
 <cfset variables.fullfilename = expandpath("/_temp") & "\" & variables.filename>
 <!--- file write java object --->
 <cfset variables.filewriter = CreateObject("java","java.io.FileWriter").init(
   variables.fullfilename
   ,JavaCast("boolean","true")
  )>
 <!--- buffered writer java object --->
 <cfset variables.bufferedwriter = CreateObject("java","java.io.BufferedWriter").init(
    variables.filewriter
   )>
 <!--- row delimeter --->
 <cfset variables.row = chr(10)>
 <!--- col delimeter --->
 <cfset variables.col = chr(9)>
 <!--- header container --->
 <cfset variables.headers = []>
 <!--- data container --->
 <cfset variables.data = []>
 <cfset newrow()>
 <cfreturn this>
</cffunction>


<!--- addheader --->
<cffunction name="addheader" access="public" returntype="void" output="false">
 <cfargument name="str" type="string" required="true">
 <cfset arrayappend(variables.headers, arguments.str)>
</cffunction>

<!--- newrow --->
<cffunction name="newrow" access="public" returntype="void" output="false">
 <cfset arrayappend(variables.data, arraynew(1))>
 <cfset variables.data_counter = arraylen(variables.data)>
</cffunction>

<!--- adddata --->
<cffunction name="adddata" access="public" returntype="void" output="false">
 <cfargument name="str" type="string" required="true">
 <cfset arrayappend(variables.data[variables.data_counter], arguments.str)>
</cffunction>

<!--- flush --->
<cffunction name="flush" access="public" returntype="void" output="false">
 <cfset var local = {}>

 <!--- write headers --->
 <cfset local.counter = 0>
 <cfset local.headers_count = arraylen(variables.headers)>
 <cfloop array="#variables.headers#" index="local.header">
  <cfset local.counter++>
  <cfset variables.bufferedwriter.write(local.header & variables.col)>
 </cfloop>

 <cfif not arrayisempty(variables.headers)>
  <cfset variables.bufferedwriter.write(variables.row)>
 </cfif>

 <!--- write data --->
 <cfloop array="#variables.data#" index="local.data">
  <cfloop array="#local.data#" index="local.cell">
   <cfset variables.bufferedwriter.write(local.cell & variables.col)>
  </cfloop>
  <cfset variables.bufferedwriter.write(variables.row)>
 </cfloop>

 <cfset variables.bufferedwriter.close()>
 <cfsetting showdebugoutput="No">
 <cfheader name="Content-Description" value="File Transfer">
 <cfheader name="Content-Disposition" value="attachment;filename=#variables.name#">
 <cfcontent type="application/vnd.ms-excel" file="#variables.fullfilename#" deletefile="true" reset="true">
</cffunction>

rip747 2008-10-24 18:33:42

I'm constantly amazed at how much head-bashing development time can be saved by adding or tweaking a few simple database indexes!

RaeLehman 2009-01-08 02:22:36

Answer 7

+1 A:

Use indexing properly. USe/apply/create foreign keys where ever you can. Querries will never time out for a db which is normalized. Be very caseful with joins and clauses like if you have group by cluase in ur query , there instead of using where clause, having clause will work faster and thus reduce query execution time.

Use cost estimation to check which table is taking more time / needs normalization in ur db.

Samiksha 2008-12-01 13:23:28

ansaurus

tags:

views:

answers:

How to run a very large query (SQL Server and ColdFusion)

related questions