views:

507

answers:

8

Updates:

  • I have submitted the bug to Adobe and referenced this SO question

  • In my real-world code where the problem occurred I decided to just remove my use of cfqueryparam. I am now using a custom function to format the param based on type. There are security and speed concerns that I will have to deal with but it gets the particular process working acceptably under current load.

  • In the future I am planning on going to process that pulls the data files into temporary tables in the database. I'll then perform operations on the data and transfer data to live tables using SQL as much as possible, instead of relying on ColdFusion


I am having a problem with looping over queries using cfqueryparam tags while inserting data. (I have not tested with select or update queries). The looping progressively takes up more memory that is not released until the request is done. However, the problem only occurs when looping over a query while in a function.

It appears to be very sensitive to the number of cfqueryparam tags used. In this example there are 15 values being inserts however in my code that actually needs this to work I am inserting an unknown number of values that can make the problem more severe.

Below is code that shows the problem. Give it a datasource name (tested on MSSQL) and it will create a tmp table and insert records as example with and without being in a function. Memory usage is display before, after the non-function loop, then after the in-function loop. It also requests garbage collection and waits 10 seconds before outputting memory info to ensure it is displaying info as accurately as possible.

In my experience with this particular test the in-function loop resulted in over 200mb of memory being used. In my real world uses it crashes ColdFusion :-(

<cfsetting enablecfoutputonly="true">
<cfsetting requesttimeout="600">

<cfset insertCount = 100000>
<cfset dsn = "TmpDB">

<cfset dropTmpTable()>
<cfset createTmpTable()>

<cfset showMemory("Before")>
<cfflush interval="1">

<cfloop from="1" to="#insertCount#" index="i">
    <cfquery name="testq" datasource="#dsn#">
        INSERT INTO tmp ( [col1],[col2],[col3],[col4],[col5],[col6],[col7],[col8],[col9],[col10],[col11],[col12],[col13],[col14],[col15] )
        VALUES ( <cfqueryparam value="TestValue" cfsqltype="CF_SQL_CHAR">, <cfqueryparam value="TestValue" cfsqltype="CF_SQL_CHAR">, <cfqueryparam value="TestValue" cfsqltype="CF_SQL_CHAR">, <cfqueryparam value="TestValue" cfsqltype="CF_SQL_CHAR">, <cfqueryparam value="TestValue" cfsqltype="CF_SQL_CHAR">, <cfqueryparam value="TestValue" cfsqltype="CF_SQL_CHAR">, <cfqueryparam value="TestValue" cfsqltype="CF_SQL_CHAR">, <cfqueryparam value="TestValue" cfsqltype="CF_SQL_CHAR">, <cfqueryparam value="TestValue" cfsqltype="CF_SQL_CHAR">, <cfqueryparam value="TestValue" cfsqltype="CF_SQL_CHAR">, <cfqueryparam value="TestValue" cfsqltype="CF_SQL_CHAR">, <cfqueryparam value="TestValue" cfsqltype="CF_SQL_CHAR">, <cfqueryparam value="TestValue" cfsqltype="CF_SQL_CHAR">, <cfqueryparam value="TestValue" cfsqltype="CF_SQL_CHAR">, <cfqueryparam value="TestValue" cfsqltype="CF_SQL_CHAR"> )
    </cfquery>
</cfloop>

<cfset showMemory("After Non-Function INSERTS")>
<cfflush interval="1">

<cfset funcTest()>

<cfset showMemory("After Function based INSERTS")>

<cfset dropTmpTable()>

<cffunction name="funcTest" output="false">
    <cfset var i = 0>
    <cfset var testq = "">
    <cfloop from="1" to="#insertCount#" index="i">
        <cfquery name="testq" datasource="#dsn#">
            INSERT INTO tmp ( [col1],[col2],[col3],[col4],[col5],[col6],[col7],[col8],[col9],[col10],[col11],[col12],[col13],[col14],[col15] )
            VALUES ( <cfqueryparam value="TestValue" cfsqltype="CF_SQL_CHAR">, <cfqueryparam value="TestValue" cfsqltype="CF_SQL_CHAR">, <cfqueryparam value="TestValue" cfsqltype="CF_SQL_CHAR">, <cfqueryparam value="TestValue" cfsqltype="CF_SQL_CHAR">, <cfqueryparam value="TestValue" cfsqltype="CF_SQL_CHAR">, <cfqueryparam value="TestValue" cfsqltype="CF_SQL_CHAR">, <cfqueryparam value="TestValue" cfsqltype="CF_SQL_CHAR">, <cfqueryparam value="TestValue" cfsqltype="CF_SQL_CHAR">, <cfqueryparam value="TestValue" cfsqltype="CF_SQL_CHAR">, <cfqueryparam value="TestValue" cfsqltype="CF_SQL_CHAR">, <cfqueryparam value="TestValue" cfsqltype="CF_SQL_CHAR">, <cfqueryparam value="TestValue" cfsqltype="CF_SQL_CHAR">, <cfqueryparam value="TestValue" cfsqltype="CF_SQL_CHAR">, <cfqueryparam value="TestValue" cfsqltype="CF_SQL_CHAR">, <cfqueryparam value="TestValue" cfsqltype="CF_SQL_CHAR"> )
        </cfquery>
    </cfloop>
</cffunction>

<cffunction name="showMemory" output="true">
    <cfargument name="label" required="true">

    <cfset var runtime = "">
    <cfset var memoryUsed = "">
    <cfset requestGC("10")>
    <cfset runtime = CreateObject("java","java.lang.Runtime").getRuntime()>
    <cfset memoryUsed = (runtime.totalMemory() - runtime.freeMemory()) / 1024 / 1024>
    <cfoutput>
        <h2>#arguments.label#</h2>
        Memory Used: #Round(memoryUsed)#mb
    </cfoutput>
</cffunction>

<cffunction name="requestGC">
    <cfargument name="waitSeconds" required="false" default="0" type="numeric">
    <cfscript>
        createObject("java","java.lang.Runtime").getRuntime().gc();
        createObject("java", "java.lang.Thread").sleep(arguments.waitSeconds*1000);
    </cfscript>
</cffunction>

<cffunction name="dropTmpTable" output="false">
    <cftry>
        <cfquery datasource="#dsn#">
            DROP TABLE tmp
        </cfquery>
        <cfcatch type="database"></cfcatch>
    </cftry>
</cffunction>

<cffunction name="createTmpTable" output="false">
    <cfquery datasource="#dsn#">
        CREATE TABLE tmp(
            col1 nchar(10) NULL, col2 nchar(10) NULL, col3 nchar(10) NULL, col4 nchar(10) NULL, col5 nchar(10) NULL, col6 nchar(10) NULL, col7 nchar(10) NULL, col8 nchar(10) NULL, col9 nchar(10) NULL, col10 nchar(10) NULL, col11 nchar(10) NULL, col12 nchar(10) NULL, col13 nchar(10) NULL, col14 nchar(10) NULL, col15 nchar(10) NULL
        )  ON [PRIMARY]
    </cfquery>
</cffunction>

Just to show that memory can be released during an operation, here is example code that builds up a larger struct and shows memory used before and after the variable is overwritten and garbage collected. In my run of this memory used after population is 118mb and after overwriting and garbage collection it is 31mb.

<cfset showMemory("Before struct creation")>
<cfflush interval="1">

<cfset tmpStruct = {}>
<cfloop from="1" to="1000000" index="i">
    <cfset tmpStruct["index:#i#"] = "testvalue testvalue testvalue testvalue testvalue testvalue testvalue testvalue testvalue testvalue">
</cfloop>

<cfset showMemory("After struct population")>
<cfflush interval="1">

<cfset tmpStruct = {}>
<cfset showMemory("After struct overwritten")>
+3  A: 

Do you have debugging on in Administrator?

If so, even if you've got showdebugoutput="false", CF will be keeping debug information about all of those queries, and with that many queries, the debugging information could quickly build up.


Also, if you've really got 80,000 rows to insert, you probably want to be doing this a different way - e.g. generating an import script that runs directly against the DB, (without CF/JDBC getting in the way).

Peter Boughton
Debugging is not on (I was setting enablecfoutputonly, not showdebugoutput). I don't disagree with you that a script purely on the database side would probably be a good idea. I imagine if I need to do anything in the multiple hundreds of thousands that I will probably need to do this. However right now I'm being bitten by some sort of bug in legit code and trying to get around it.
Dan Roberts
I am starting to think the db approach may be the safest bet
Dan Roberts
Pete's point was that even if you are suppressing the debug output, having it on at all can eat up memory. Make sure "Enable Request Debugging Output" is not checked.
Al Everett
Just checked to absolutely certain and it is NOT check. I have run into issues with running debugging so that was one of my first thoughts as well.
Dan Roberts
+1  A: 

No idea if it will make a difference, but something to try - shrink the in-function loop, and loop round the function multiple times.

What this does with memory might help narrow down where it is being used up.

<cffunction name="funcTest" output="false">
    <cfargument name="from" />
    <cfargument name="to" />
    <cfset var i = 0>
    <cfset var testq = "">
    <cfloop from="#arguments.from#" to="#arguments.to#" index="i">
        <cfquery name="testq" datasource="#dsn#">
            ...
        </cfquery>
    </cfloop>
</cffunction>


<cfset BlockSize = 100 />
<cfloop index="CurBlock" from="1" to="#(InsertCount/BlockSize)#">

    <cfset funcTest
     ( from : CurBlock*(BlockSize-1) + 1
     , to   : CurBlock*BlockSize
     )/>

</cfloop>
Peter Boughton
That is a very good idea. I'll run a test tomorrow to see if calling a function for a subset of insert reduces the memory build up. I hope this works, though now that I look back at my example the memory isn't being released after the function call is done and gc is called.
Dan Roberts
Unfortunately that didn't help. I did 50 blocks of 3000 and the memory consumed was the same as one block of 150,000.
Dan Roberts
+1  A: 

Maybe multiple insert can help? This technique itself typically works faster, saving some time can help you save some memory.

Yes I've seen your note "inserting an unknown number of values", but this should work if you have constant number of fields/values in a single insterting batch.

Sergii
It looks like this technique brings a major speed improvement. I expect to still have to get rid of cfqueryparam but at least with this I doubt there will be a performance loss.
Dan Roberts
A: 

Hi,

try to prepend "variables." before each query inside of your cffunctions. I've had a similiar issue and this fixed it.

So change:

<cfquery name="testq" datasource="CongressPlus">

to

<cfquery name="variables.testq" datasource="CongressPlus">

Cheers,

Thomas

I don't actually want the query to be in variables scope. My actual code is in a CFC and this could cause problems. Actually in my real code I have removed the name all together.
Dan Roberts
For tags like cfquery that create variables, for stuff that needs to be private you need to var the variable ahead of time.<cfset var qName="" /><cfquery name="qName" ...>Of course, if you're not using a name on your inserts it probably doesn't matter.
Al Everett
yeah, in some cases varaibles with default names (ex: cfhttp) but that does not appear to be the case with cfquery and doesn't affect the test one way or another when the query is an insert/update
Dan Roberts
+1  A: 

My first guess would be to type the values in your cfqueryparam - as in type="CF_SQL_CHAR". Why would this help? I'm not sure, but I can guess that there would be additional overhead with a non-typed variable.

Good thought but I've made the change and memory is still built up and not released during the in-function loop.
Dan Roberts
A: 

It's been well documented all over the community that CF will not release memory until after the request is finished. Even calling the GC directly has no effect on freeing up memory during a running request. Don't know if this is by design or a bug.

I haven't a clue why you would even want to do something like this in CF anyways. There is no reason for you to be inserting 80K rows into a database using CF, no matter which database engine you're using.

Now, if there is a reason that you need to do this, such as you're getting the data from say an uploaded CSV or XML file; MSSQL has a TON of better ways to do this and workarounds.

One approach that I have done over the years is to create a stored procedure in MSSQL that calls BCP or BULK INSERT to read a file that contains the data to insert.

The best thing about this approach is that the only thing CF is doing is handling the file upload and MMSQL is doing all the work processing the file. MSSQL has no problems inserting millions of rows using BCP or BULK INSERT and will be INFINITELY faster then anything CF can process.

rip747
I agree that this would probably be best used in the database. I am doing manipulations and lookups in CF but I believe they could be transfered to set based query operations, though I haven't looked through everything yet.I disagree with your statement that CF will not release memory until after the request. I will add a code example above showing memory released when a variable is overwritten.
Dan Roberts
+1  A: 

Assuming you are using CF8... not sure if this happens in CF7...

Try turning off "Max Pooled Statements" (set it to zero) in your datasource "advanced settings"... I bet money your memory leak goes away...

That is where I have found the bug to be... this was causing all kinds of crashes on some CF servers until we found this... we are 100% more stable now because of this...

Patrick Steil

I actually set pooled statments to zero then later turned off maintain connection completely after reading on various blogs about that being a possible solution. What leads me to rule out various settings now is the fact that I narrowed down the memory leaks sole to whether the code is in a function.
Dan Roberts
It is in CF8. Do you get similar results when you run the test?
Dan Roberts
I haven't done any more testing on this myself... so are you saying that with "Max Pooled Statements" turned on and the cfqueryparam NOT inside a CFFUNCTION that the memory leak does NOT happen? And does it matter if that CFFUNCTION is within a CFC or does it happen in just a regular UDF within a CFM page?