tags:

views:

183

answers:

2

I've written a database load script in ColdFusion and I'm having a problem that the script slowly runs out of memory. I've split each table load into its own thread with <cfthread> and I'm calling the garbage collector when memory dips below 50% (making sure to have 30 seconds between gc() calls to prevent the garbage collector from hogging memory).

I created a CFC to hold all the queries needed by the script. The script calls the appropriate CFC function which then returns the query, some of which are over 2 MB in size. When I look in the Server Monitor in the details view of the Memory page for Active Threads, it looks like my CFC is keeping a copy of the query in memory even though I varscoped the query variable and the variable went out of scope at the end of the function. In addition, I have a copy of the query in memory in my thread. So I'm left with what looks like two copies of the query in memory. Is this really what's happening? If it is, how can I eliminate one copy of the query from memory?

+10  A: 

There's a lot of potential issues here, but I'll try to underline some of the most important things for you to consider:

  1. Why the threads? Do you need the threads? There's a certain point at which you're probably tinkering too much for your own good.
  2. Manually forcing garbage collection isn't necessarily a good idea. Tune the JVM to perform its garbage collection automatically, but don't overdo it, either. Garbage Collection tends to be expensive, and can impact the performance of your app if it is running too frequently.
  3. How are you instantiating your CFC? If you are instantiating the CFC on every request for the query, you're going to experience RAM issues over time, a slow memory leak as CFCs are loaded up into RAM too quickly for garbage collection to keep up. Your best bet is to make this a singleton. (ie., set it into the application scope).
  4. Be aware that var-scoping a variable doesn't (as far as I understand it) automatically free up the memory as soon as the variable stops being used. The memory is still reserved, though it's likely flagged somehow as being part of a short-lived generation so that it will (probably?) be cleaned up faster. But this doesn't guarantee anything.
  5. If you're looking at active threads, it's also possible that the query isn't going to be cleared until the end of the request--not necessarily the end of the function call. It seems impatience that would motivate you to expect a query to immediately die as soon as the function call is completed.
  6. ColdFusion queries are passed by reference, not by value. It should be impossible to get 2 copies of the query in memory, unless you're somehow using duplicate() or a similar function to explicitly copy the query.

The query is likely returning a pointer to the query from your cfreturn statement. That query will not be cleaned up until all processes are done referencing it. So if it passes the query to some other process, you're not going to get that query cleaned out of memory. If you set that query to a session variable, for instance, that pointer isn't going anywhere until that session variable is gone, no matter how frequently you try to force garbage collection.

Just a few things to consider.

Shawn Grigson
Fantastic answer. I was not looking forward to trying to explain all that, but looks like I don't have to. :-)
Ben Doom
Hey, thanks! :)
Shawn Grigson
Thanks. It's helpful to know that queries are passed by reference. I wasn't able to find that information in my online search. I'm instantiating the CFC once per thread. I'm executing multiple threads because memory doesn't seem to be fully cleaned up, even with gc(), until a request is finished. It appears that a thread is its own request, so I multi-threaded it for the cleanup.I've been setting the query variable to an empty string when I'm finished using it, and that seems to relieve some of the memory issues.
stomcavage
Um, threads don't help you use less memory, but they do ensure that you have more stuff loaded into memory at the same time. Also, I question the wisdom of even *wanting* the garbage collector to pause everything and clean up while the request is still running.
Joel Mueller
I block the threads so that only one executes at a time. So yes, they do ensure that the script uses less memory. And my testing bears that out. This way, I only have to add one page to the scheduled jobs list and that job kicks off all the individual threads that import data into the database. The wisdom of running gc() in the middle of a thread is that the thread is very long running and without it, the script's memory use tends to crash the server.
stomcavage
A: 

I had a similar problem with processing a large data insert, where each row requires extensive processing involving multiple CFCs. It appears that th JDBC ResultSet, Statement and Connection references created by <cfquery> are held until the end of the request. This means that nulling your query variable has no affect on memory usage. The way I got around this was to make a gateway call to a CFC function to processes 100 rows, then that function makes another gateway call for the next 100 rows etc until all rows are processed. Because each individual gateway call actually exits, it releases all it's handles and that memory gets recovered.

Mark Porter
Very interesting. I might give that a try. Thanks.
stomcavage