views:

124

answers:

3

For the last few years we've been randomly seeing this message in the output logs when running scheduled tasks in ColdFusion:

Recursion too deep; the stack overflowed.

The code inside the task that is being called can vary, but in this case it's VERY simple code that does nothing but reset a counter in the database and then send me an email to tell me it was successful. But I've seen it happen with all kinds of code, so I'm pretty sure it's not the code that's causing this problem.

It even has an empty application.cfm/cfc to block any other code being called.

The only other time we see this is when we are restarting CF and we are attempting to view a page before the service has fully started.

The error rarely happens, but now we have some rather critical scheduled tasks that cause issues if they don't run. (Hence I'm posting here for help)

Memory usage is fine. The task that ran just before it reported over 80% free memory. Monitoring memory through the night doesn't show any out-of-the-ordinary spikes. The machine has 4 gigs of memory and nothing else running on it but the OS and CF. We recently tried to reinstall CF to resolve the problem, but it did not help. It happens on several of our other servers as well.

This is an internal server, so usage at 3am should be nonexistent. There are no other scheduled tasks being run at that time.

We've been seeing this on our CF7, CF8, and CF9 boxes (fully patched).

The current box in question info:

  • CF version: 9,0,1,274733
  • Edition: Enterprise
  • OS: Windows 2003 Server
  • Java Version: 1.6.0_17
  • Min JVM Heap: 1024
  • Max JVM Heap: 1024
  • Min Perm Size: 64m
  • Max Perm Size: 384m
  • Server memory: 4gb
  • Quad core machine that rarely sees more than 5% CPU usage

JVM settings:

-server -Dsun.io.useCanonCaches=false -XX:PermSize=64m -XX:MaxPermSize=384m -XX:+UseParallelGC -XX:+AggressiveHeap -Dcoldfusion.rootDir={application.home}/../ -Dcoldfusion.libPath={application.home}/../lib -Doracle.jdbc.V8Compatible=true

Here is the incredible complex code that failed to run last night, but has been running for years, and will most likely run tomorrow:

<cfquery datasource="common_app">
    update  import_counters
    set current_count = 0
</cfquery>

<cfmail subject="Counters reset" to="[email protected]" from="[email protected]"></cfmail>

If I missed anything let me know. Thank you!

A: 

We had this issue for a while after our server was upgraded to ColdFusion 9. The fix seems to be in this technote from Adobe on jRun 4: http://kb2.adobe.com/cps/950/950218dc.html

You probably need to make some adjustments to permissions as noted in the technote.

Daniel Sellers
@Daniel: Thanks for the comment. That's what we initially thought too, but that would mean that it should never run. Since it runs 99% of the time, it wouldn't be a permissions issue.
BigWorld
@Daniel: Just checked the account that CF is running under and it is setup as recommended in that article. Thanks for your suggestion though!
BigWorld
I'm not sure if this will be the solution for everyone experiencing this issue, but this is what we did to solve it. We had all the permissions set as listed in the article Daniel mentioned above, but once we REMOVED all of the permissions and RESET them back up, we have never seen the problem again. We tested it on another server that we found that had the problem and it resolved it there as well (to date). So, even though your settings APPEAR to be correct, try wiping them out and resetting them. I have NO idea why it was allowed to run sometimes and not others....that is still a mystery.
BigWorld
A: 

what you could try is to set Minimum JVM Heap Size to the same as your Maximum JVM Heap Size (MB) with in your CF administrator.

Also update the JVM to the latest (21) or at least 20.

In the past i've always upgraded the JVM whenever something wacky started happening as that usually solved the problem.

rip747
I think he already did the min/max as per his post - Min JVM Heap: 1024, Max JVM Heap: 1024. Updating the JVM is a good idea though, rarely hurt for minor revisions.
jfrobishow
@rip747: I'll update to the latest and report back. It might take a while because I can't force the issue to happen - it just waits until the worse possible time to happen so it has it's own schedule. ;) thanks
BigWorld
A: 

Have you tried reducing the size of your heap from 1024 to say 800 something. You say there is over 80% of memory left available so if possible I would look at reducing the max.

Is it a 32 or 64 bits OS? When assigning the heap space you have to take into consideration all the overhead of the JVM (stack, libraries, etc.) so that you don't go over the OS limit for the process.

jfrobishow
@jfrobishow: The settings used to be 256/768. We had moved both up to 1024 by recommendation of someone that thought it would fix this. But it was happening on that old setting as well.It is a 32 bit OS running inside a 64 bit as a VM. So as far as it can tell, it's 32 bit. The machine has 4 gigs of memory (not all of it is usable in a 32 bit env of course) and that is why we limited it to 1 gb and 384 mb and that would leave plenty for the OS itself.Correct, at the time of running it had about 80% free memory, but it's not like that all day - just during the night.
BigWorld
If @rip747 suggestion of updating to the latest jvm doesn't work you could always try playing with the request size as they live on the stack. -Xss{size}{unit} like -Xss64k. The default is 320K on a 32bits Windows, 64k is the smallest possible...so maybe decrease it a bit as the error suggest a stack overflow. It's far fetch, but since you've had the issue for years it might be worth a shot.
jfrobishow