views:

1550

answers:

7

First, just a bit of background:

One of our customers is experiencing CPU usage spikes for WebSphere instances running one of our web apps (other instances with other apps are fine). They have a test environment and a live environment (both iSeries) which both experience the problem - with a single app per instance setup. We have deployed this application locally in our own test environments and also for many other customers all on iSeries with no similar problems.

What's actually happening:

Every one second or so, the CPU usage for the WebSphere process' CPU usage jumps to anywhere from 7%-20% even though there are no requests being processed at the time. Customer has reported seeing spikes as high as 30%. These spikes average out to be 1.5% of CPU overall - the other WebSphere instances typically use 0%-0.1% when idle.

My investigations so far

So, I had a look at the threads. One thread in there test environment was using ~350 CPU cycles per second. A similar thread in their live environment was using ~1500 CPU cycles per second (showing that it has bigger CPU). The call stack for these threads looks like

Type  Program                  Statement         Procedure                    
      QLESPI     QSYS          17                LE_Create_Thread2__FP12crtt >
      QJVALIBJVM QSYS          7                 startThread__FPv             
 J    com/ibm/ws/util/Threa >                    run                          
 J    com/ibm/ws/util/Threa >                    run                          
 J    com/ibm/ws/util/Threa >                    getTask                      
 J    com/ibm/ws/util/Bound >                    poll

The entire class name from the bottom line is com/ibm/ws/util/BoundedBuffer. I asked the customer to do a JVM Dump for me - the only additional information I got from this was the thread name:

Thread:  00002F82 Deferrable Alarm : 11

Now for my questions:

  • Can any of you identify the problem, given these symptoms? (Maybe that's a long shot!)
  • What is Deferrable Alarm? From the JVM Dump, I can see 4 threads with this name. The other three seem to be doing just fine. By debugging my local WebSphere (on Windows) and adding breakpoints in the BoundedBuffer class, I see that BoudedBuffers are polling and periodically invoking some listener.
  • I don't have access to the WebSphere console for the customer machines, and they aren't owning up to having made any config changes. I can ask them to check the console for me though - what should I be asking them to look at?
  • I have telnet access to the customer boxes, is there anything else I can investigate here? Looking at the WebSphere profile files, etc? Which files should I be looking at?
  • Because the Call Stack and JVM Dump don't explicitly reference our code, is it safe to assume that this is a configuration problem?

It's been a long question, so thanks for reading this far.

30 April Update (1)

This morning I've noticed that this behaviour only happens after the first request of the day has been processed (irrespective of which Web Service is invoked). This points the finger back at our application or Apache Axis. Could it be that this is just normal behaviour?!

30 April Update (2)

So it seems that this CPU activity is some kind of housekeeping activity for the web-container or maybe something within Apache Axis. I've now observed this happening on a few different web-applications on a few different servers. Applications with no web component don't suffer the same additional CPU overhead.

I'd imagine if it is housekeeping work, that "tuning" it somehow could be counter productive - by that, I mean that making the App Server idle better would probably negatively affect the amount of "real" work it can do.

A: 

Very instinctively (being unfamiliar with iSeries platforms) I would look at disk IO related issues. Can you describe the disk subsystem? Can you see if your app is spending an unusually large amount of time in iowait?

Stu Thompson
Thanks for the suggestion. I'd looked at I/O, but when the server is idle, there are still CPU spikes, but no I/O calls.
Harry Lime
A: 

Hi

I know this doesn't quite match your problem. But might be worth a look if your running prior to WAS 6.1 patch 17.

http://www-01.ibm.com/support/docview.wss?uid=swg24018437

Hope this helps. Cheers John

Interesting read.. thanks
Harry Lime
A: 

My best guess is that it is some type of monitoring is being done on the instance, like Tivioli etc. Have you ruled out any GC activity?

HTH Tom

Tom
Thanks for the comment Tom. Tivioli was one of the things we first looked at. Not the culprit though. I've managed to satisfy the customer with the "housekeeping" theory, so hopefully that's the end of it now :P
Harry Lime
A: 

You could try to profile and do heap dumps of the application, that could answer a few questions related to memory and cpu usage.

pengtuck
A: 

Most application servers are implemented in java itself and so is WebSphere. This servers apart from serving client requests have to do other periodical jobs like say resource pool management. Performing this jobs will create some temporary objects that needs to be garbage collected.

Depending on how much heap you have allocated, usage and garbage collector settings, garbage collector will be invoked. I'd say try to see if it is garbage collector thread that is taking up your CPU. For this connect jconsole utility to remote websphere process for a day and see if there is any co-relation between heap usage and cpu usage.

Gladwin Burboz
+1  A: 

I would recommend following the must gather documentation provided by IBM, and raising a PMR along with your own investigation. Things you might suspect:

  • Garbage collection (unlikely on low application utilization)
  • Timers or tasks (such as java.util.Timer or commonj work manager)
  • Pretest connection that has a complex SQL query (in the DataSource's WebSphere Application Server data source properties)

I would also recommend using the profiler to determine the cause, YourKit profiler is a pretty decent one.

Zoran Regvart
A: 

I am also experiencing this very same issue, [Deferrable Alarm:x] using with BoundedBuffer. The only difference I have is that this is on a Windows 7 64bit machine. There is absolutely no Tivioli or other batch process running, no requests being made, the single instance is just idol.

I can run the application in DEBUG mode and pause the Deferrable Alarm thread and the CPU spikes stop, resume and they start again.

I've check disk activity, network activity and their is nothing happening there.

I am running WebSphere 6.1.0.27.

Si-R