views:

408

answers:

3

I am attempting to solve performance issues with a large and complex tomcat java web application. The biggest issue at the moment is that, from time to time, the memory usage spikes and the application becomes unresponsive. I've fixed everything I can fix with log profilers and Bayesian analysis of the log files. I'm considering running a profiler on the production tomcat server.

A Note to the Reader with Gentle Sensitivities:

I understand that some may find the very notion of profiling a production app offensive. Please be assured that I have exhausted most of the other options. Reason I am considering this is that I do not have the resources to completely duplicate our production setup on my test server, and I have been unable to cause the same failures on my test server.

Questions:

I am looking for answers which work either for a java web application running on tomcat, or answer this question in a language agnostic way.

  • What are the performance costs of profiling?
  • Any other reasons why it is a bad idea to remotely connect and profile a web application in production (strange failure modes, security issues, etc)?
  • How much does profiling effect the memory foot print?
  • Specifically are there java profiling tools that have very low performance costs?
  • Any java profiling tools designed for profiling web applications?
  • Does anyone have benchmarks on the performance costs of profiling with visualVM?
  • What size applications and datasets can visualVM scale to?
+2  A: 

I've used YourKit to profile apps in a high-load production environment, and while there was certainly an impact, it was easily an acceptable one. Yourkit makes a big deal of being able to do this in a non-invasive manner, such as selectively turning off certain profiling features that are more expensive (it's a sliding scale, really).

My favourite aspect of it is that you can run the VM with the YourKit agent running, and it has zero performance impact. it's only when you connect the GUI and start profiling that it has an effect.

skaffman
+5  A: 

OProfile and its ancestor DPCI were developed for profiling production systems. The overhead for these is very low, and they profile your full system, including the kernel, so you can find performance problems in the VM and in the kernel and libraries.

To answer your questions:

  1. Overhead: These are sampled profilers, that is, they generate timer or performance counter interrupts at some regular interval, and they take a look at what code is currently executing. They use that to build a histogram of where you spend your time, and the overhead is very low (1-8% is what they claim) for reasonable sampling intervals.

    Take a look at this graph of sampling frequency vs. overhead for OProfile. You can tune the sampling frequency for lower overhead if the defaults are not to your liking.

  2. Usage in production: The only caveat to using OProfile is that you'll need to install it on your production machine. I believe there's kernel support in Red Hat since RHEL3, and I'm pretty sure other distributions support it.

  3. Memory: I'm not sure what the exact memory footprint of OProfile is, but I believe it keeps relatively small buffers around and dumps them to log files occasionally.

  4. Java: OProfile includes profiling agents that support Java and that are aware of code running in JITs. So you'll be able to see Java calls, not just the C calls in the interpreter and JIT.

  5. Web Apps: OProfile is a system-level profiler, so it's not aware of things like sessions, transactions, etc. that a web app would have.

    That said, it is a full-system profiler, so if your performance problem is caused by bad interactions between the OS and the JIT, or if it's in some third-party library, you'll be able to see that, because OProfile profiles the kernel and libraries. This is an advantage for production systems, as you can catch problems that are due to misconfigurations or particulars of the production environment that might not exist in your test environment.

  6. VisualVM: Not sure about this one, as I have no experience with VisualVM

Here's a tutorial on using OProfile to find performance bottlenecks.

tgamblin
A: 

JXInsight is the clear leader in the field in this regard. We can easily and at runtime drop the cost down to 1 nanosecond per instrumented method. In benchmarks we are a thousand times faster than the nearest competitor. We are also the most extensible with unmatched runtime execution insight.

JXInsight 5.7.27 Released – 1 Billion Operations Per Second http://blog.jinspired.com/?p=655

Those claiming to have very low overhead (2-5%) generally factor in that the bottleneck in your application is a database and hence any huge overhead incurred will be mitigated by the overall poor performance of your own code base and remote resources.

William Louth
By the way we have published some benchmark results based on previous releases of our probes technology.http://opencore.jinspired.com/?page_id=123
William Louth
1 ns is pretty low, isn't an add instruction 20 ns on most modern cpu's. I'd be really interested in learning how you do this?
e5