How to analyze memory fragmentation in java?

A:

I have used YourKit to good effect for this type of problem.

Eric J. 2009-08-10 06:37:46

Yes, a great tool. But it doesn't show memory fragmentation - only memory consumption, which it doesn't help with lags. Or I don't know some cool options :)?

Vitaly 2009-08-10 06:41:55

YourKit and other memory profilers will show you when / how frequently GC is happening to attempt to reorganize memory and reduce fragmentation. It won't show you your fragmentation directly (unless I don't know about a cool option too)

Eric J. 2009-08-10 08:49:10

I'm going to check behavior of GC through VisualVM. By the way, YourKit is dangerous to run at production servers. We had problems with performance even with an option "disable all". And thanks for advice about big objects.

Vitaly 2009-08-10 09:04:53

What kind of performance problems did you see? So far we have only used it in our stress test environment but had considered running it on one application server for a while to gather real-life metrics.

Eric J. 2009-08-10 16:20:38

Lags :) If byte code instrumentation is turned on (default settings of agent), the lags are huge. If off, just lags.

Vitaly 2009-08-10 16:41:29

+2 A:

Look at your Java documentation for the "java -X..." options for turning on GC logging. That will tell you whether you are collecting old or new generation, and how long the collections are taking.

A pause of "several minutes" sounds extraordinary. Are you sure that you aren't just running with a heap size that is too small, or on a machine with not enough physical memory?

If your heap too close to full, the GC will be triggered again and again, resulting in your server spending most of its CPU time in the GC. This will show up in the GC logs.
If you use a large heap on a machine with not enough physical memory, a full GC is liable to cause your machine to "thrash", spending most of its time madly moving virtual memory pages to and from disc. You can observe this using system monitoring tools; e.g. by watching the console output from "vmstat 5" on a typical UNIX/Linux system.

Stephen C 2009-08-10 06:57:14

Several minutes lags happens to, once or two in a day. Edited a qustion.

Vitaly 2009-08-10 07:22:13

I'll try vmstat 5, thanks

Vitaly 2009-08-10 07:25:06

Yes, I'll try verbose GC output. It just prints too much info - can slow down servers, wouldn't want to do it :) Now we use GarbageCollectorMXBeans. The output looks like this: ConcurrentMarkSweep 27459. And lag almost perfectly matches with it (27 sec). It happens every hour or so, that's why I think about memory fragmentation, not a memory leak. – Vitaly 0 secs ago [delete this comment]

Vitaly 2009-08-10 07:28:34

A:

There is no memory fragmentation in Java; during the GC run, memory areas are compacted.

Since you don't see a high CPU utilization, there is no GC running, either. So something else must be the cause of your problems. Here are a few ideas:

If the database of your application is on a different server, there may be network problems
If you run Windows and you have mapped network drives, one of the drives may lock up your computer (again network problems). The same is true for NFS drives on Unix. Check the system log for network errors.
Is the computer swapping lots of data to disk? Since CPU util is low, the cause of the problem could be that the app was swapped to disk and the GC run forced it back into RAM. This will take a long time if your server hasn't enough real RAM to keep the whole Java app in RAM.

Also, other processes can force the app out of RAM. Check the real memory utilization and your swap space usage.

To understand the output of the GC log, this post might help.

[EDIT] I still can't get my head around "low CPU" and "GC stalls". Those two usually contradict each other. If the GC is stalling, you must see 100% CPU usage. If the CPU is idle, then something else is blocking the GC. Do you have objects which overload finalize()? If a finalize blocks, the GC can take forever.

Aaron Digulla 2009-08-10 07:54:09

Well there IS fragmentation, but the GC will attempt to reduce it when it runs. Having too many large-ish (relative to your available heap) objects that are frequently allocated/deallocated will cause the app to spend a lot of time in GC and harm performance.

Eric J. 2009-08-10 08:50:52

There is a memory fragmentation if ConcurrentMarkAndSweep used. For example, http://publib.boulder.ibm.com/infocenter/wasinfo/v6r0/index.jsp?topic=/com.ibm.websphere.base.doc/info/aes/ae/rprf_javamemory.html.

Vitaly 2009-08-10 08:56:29

No database used.

Vitaly 2009-08-10 08:57:24

Eric J., large objects sound as a great idea, thank you. I'll check memory dumps.

Vitaly 2009-08-10 09:00:33

No, it is not network problems. Lags reported by GC matches lags in our internal log.

Vitaly 2009-08-10 09:08:15

@Vitaly: Please give some more information how your app works. Is it creating huge HashMaps or something (i.e. large data graphs with lots of references to other objects) all the time?

Aaron Digulla 2009-08-10 09:50:19

@Vitaly: See my edits.

Aaron Digulla 2009-08-10 09:52:40

Aaron, about CPU. By "not a high CPU utilization" I meant that all the threads are blocked and, so, it isn't our code who causing lags. CPU usage is high during the lags :))

Vitaly 2009-08-10 10:06:56

No, no finalize() methods in the code

Vitaly 2009-08-10 10:07:42

@Aaron About hash maps. We have some HashMaps of size 20-30K elements (not sure if they are big). Also we create quite big byte[] quite often - they can be of size 40K-80K.

Vitaly 2009-08-10 10:12:24

byte[] doesn't matter much for GC; for a GC, only object references matter. Do you add/remove elements from the HashMaps all the time?. As for "all threads are blocked", that should be "all threads but the GC thread". And the GC thread eats as much CPU as possible, so while it runs, you must have 100% CPU usage or something is odd.

Aaron Digulla 2009-08-10 10:50:16

A:

Vitaly, There is fragmentation problem. My observation: If there are small size of the objects which are getting updated frequently then in that case it generates lot of garbage. Though CMS collects the memory occupied by these objects, this memory is fragmented. Now Mark-Sweep-Compact thread comes into picture (stop the world)and try to compact this fragmented memory causing long pause.

Opposite to that if the objects size is bigger then it generates less fragmented memory and
Mark-Swap-Compact takes less time to compact this memory. This may cause less throughput but will help you to reduce the long pause caused by GC compaction.

Kishor 2009-09-16 13:06:41

We've already handled the problem. Sometimes there wasn't enough memory in oldgen to copy objects survived from young gen. Starting CMS when fixed amount memory consumed fixed the problem.

Vitaly 2009-09-16 15:26:03

Vitaly, Can you please briefly point out the way you solved fragmentation problem? How exactly you triggered CMS after consuming fixed amount of money?And how this solve the fragmentation problem?Kishor

Kishor 2009-09-17 10:28:37

Please read money == memory

Kishor 2009-09-17 12:26:29

A:

To see how Vitaly probably handled this, see Understanding Concurrent Mark Sweep Garbage Collector Logs.

2009-12-11 12:18:34

ansaurus

tags:

views:

answers:

How to analyze memory fragmentation in java?

related questions