views:

88

answers:

4

When log shows a lot of Garbage Collection hits, what code change shall we need?
Do we need to free some objects?
Will we speed up the code with object reusal?

EDIT

I run this code against a lot of names:

 public static String removeAccents(String s) {
        if (s == null)
            return null;
        StringBuilder sb = new StringBuilder();
        int n = s.length();
        for (int i = 0; i < n; i++) {
            char c = s.charAt(i);
            int pos = UNICODE.indexOf(c);
            if (pos > -1) {
                sb.append(PLAIN_ASCII.charAt(pos));
            } else {
                sb.append(c);
            }
        }
        return sb.toString();
    }

EDIT2

log of GC

05-17 14:05:07.629: DEBUG/dalvikvm(8823): GC freed 13344 objects / 523736 bytes in 73ms
05-17 14:05:08.269: DEBUG/dalvikvm(8823): GC freed 13341 objects / 524608 bytes in 72ms
05-17 14:05:08.889: DEBUG/dalvikvm(8823): GC freed 13302 objects / 525112 bytes in 72ms
05-17 14:05:09.519: DEBUG/dalvikvm(8823): GC freed 13151 objects / 524360 bytes in 72ms
05-17 14:05:10.089: DEBUG/dalvikvm(8823): GC freed 13377 objects / 524384 bytes in 71ms
05-17 14:05:10.779: DEBUG/dalvikvm(8823): GC freed 13137 objects / 523872 bytes in 72ms
05-17 14:05:11.389: DEBUG/dalvikvm(8823): GC freed 13289 objects / 524656 bytes in 72ms
05-17 14:05:12.049: DEBUG/dalvikvm(8823): GC freed 13113 objects / 524336 bytes in 71ms
05-17 14:05:12.299: DEBUG/dalvikvm(4864): GC freed 206 objects / 10216 bytes in 358ms
05-17 14:05:12.769: DEBUG/dalvikvm(8823): GC freed 13289 objects / 524272 bytes in 75ms
05-17 14:05:13.449: DEBUG/dalvikvm(8823): GC freed 13165 objects / 524192 bytes in 68ms
05-17 14:05:14.099: DEBUG/dalvikvm(8823): GC freed 13221 objects / 524016 bytes in 73ms
05-17 14:05:14.719: DEBUG/dalvikvm(8823): GC freed 13179 objects / 524768 bytes in 73ms
05-17 14:05:15.349: DEBUG/dalvikvm(8823): GC freed 13306 objects / 524328 bytes in 73ms
05-17 14:05:15.999: DEBUG/dalvikvm(8823): GC freed 13280 objects / 523536 bytes in 73ms
05-17 14:05:16.589: DEBUG/dalvikvm(8823): GC freed 13314 objects / 524928 bytes in 68ms
05-17 14:05:17.249: DEBUG/dalvikvm(8823): GC freed 13217 objects / 524792 bytes in 73ms
05-17 14:05:17.929: DEBUG/dalvikvm(8823): GC freed 13176 objects / 524104 bytes in 68ms
05-17 14:05:18.449: DEBUG/dalvikvm(9926): GC freed 10341 objects / 558184 bytes in 488ms
05-17 14:05:18.689: DEBUG/dalvikvm(8823): GC freed 13485 objects / 524664 bytes in 75ms
05-17 14:05:19.279: DEBUG/dalvikvm(8823): GC freed 13337 objects / 523816 bytes in 67ms
05-17 14:05:19.909: DEBUG/dalvikvm(8823): GC freed 13269 objects / 524784 bytes in 72ms
05-17 14:05:20.419: DEBUG/dalvikvm(8823): GC freed 13389 objects / 524416 bytes in 72ms
05-17 14:05:21.069: DEBUG/dalvikvm(8823): GC freed 12948 objects / 523712 bytes in 72ms
05-17 14:05:21.659: DEBUG/dalvikvm(8823): GC freed 13436 objects / 525040 bytes in 68ms

Do you consider this is too much?

+1  A: 

If the GC runs frequently, it is a good warning sign that a lot of temporary objects are being created. This significantly decreases the performance of the application as the garbage collector tries to clean up unreferenced objects in the heap. To counter it, you might want to profile the application and find out the improvement points. An example of creating a lot of temporary objects would be the following piece of code:

String str = "";    
for(int i=0;i<1000000;i++){
   str = str + String.valueOf(i);
}

To avoid creating a lot of objects, you could replace the above piece of code with StringBuffer/StringBuilder.

Snehal
@Snehai - You're right in general, but for this specific example the compiler does the StringBuilder optimisation for you anyway
Paolo
@Paolo: I'm not really sure if the compiler performs the optimization. This example is similar to the one provided by Joshua Bloch in Effective Java - Item 51. Bloch warns to avoid using this concatenation and advices to use StringBuffer/StringBuilder.
Snehal
+4  A: 

Frequent garbage collection can be caused by a number of things. For example:

  • Your application may be creating too many temporary object.

  • Memory leaks caused by your application keeping references to objects that are no longer required.

  • The heap may be too small.

The first two problems will be revealed if you run a memory profiler on your application, and the solution will usually be self evident.

The third problem can be seen by examining the GC logs, and noting that each GC run only succeeds in reclaiming a relatively small amount of memory. Ideally, you want the GC to reclaim 50% or more of the heap each time it runs. The fix is typically to increase the maximum heap size using the JVM's -Xmx command-line option.

Will we speed up the code with object reusal?

Generally speaking no. Recycling is painful, there is no guarantee that it will succeed. For example, you will have a hard time getting many classes in the Java standard library and 3rd-party libraries to recycle internal data structures.

You should only resort to explicitly recycling objects if all other attempts to fix the problem have failed. The simplest solution is often just to give the application a bigger heap.

EDIT

One way to reduce memory usage for the code in the edited question is to change:

   StringBuilder sb = new StringBuilder();

to

   StringBuilder sb = new StringBuilder(s.length());

It might help to reuse the StringBuilder as well, but if this code is giving you excessive GC rates then the problem is more likely to be that there is a memory leak (somewhere else in your application) or that your heap is simply too small. (People don't realize this, but there is a significant memory overhead for every Java String ... something like 48 bytes if my mental arithmetic is correct.)

EDIT 2

The GC logs say that you are reclaiming 50000Kb each time the GC runs, and they strongly suggest that the memory usage is not increasing. (The latter is good news; it pretty much rules out a memory leak.) I think you need to increase the heap size using the -Xmx and -Xms options. You want to be reclaiming a few megabytes in each GC cycle to reduce the average GC overhead per byte reclaimed.

The other thing that struck me is that maybe you can change your removeAccents method so that it only creates a new String if the result String will be different to the input String. In other words, if there are no accents it should just return the input String.

Stephen C
By reuse you mean to put out as class property?
Pentium10
@Pentium10 - sort of. It would be better if you made the method instance level. Then you could make the StringBuilder an attribute of the same object.
Stephen C
Ok. I've added a `if (!found) { return s; } else { return sb.toString(); }`
Pentium10
A: 

it could be the int pos create/destroy thats causing the GC.

Try declaring it outside the for loop and just resetting it to -1 as the first line inside the for loop.

 public static String removeAccents(String s) { 
    if (s == null) 
        return null; 
    StringBuilder sb = new StringBuilder(); 
    int n = s.length(); 
    int pos = -1;
    for (int i = 0; i < n; i++) { 
        pos = -1; //set it here just in case.
        char c = s.charAt(i); 
        pos = UNICODE.indexOf(c); 
        if (pos > -1) { 
            sb.append(PLAIN_ASCII.charAt(pos)); 
        } else { 
            sb.append(c); 
        } 
    } 
    return sb.toString(); 
} 

Same goes for char c, create it outside the for loop and reset it inside it each time you need it.

Mauro
-1 - Incorrect. Space for local variables is part of the method's stack frame. Only objects / arrays created using `new` occupy heap space.
Stephen C
cool, didnt know that, thought they would still be GC'd.
Mauro
+1  A: 

Your code generates many heap-allocated short-lived objects. This is a GC dream: the GC is optimized to deal with that exact situation. It is no wonder that you see many GC invocations, but this is normal, expected, and does not make your code slow. You can see in your log file that each GC run takes about 70 ms and occurs twice per second; that's 14% of your time, at most. In other words, even if by miracles of object recycling you remove all dynamic allocations, you will get a speedup by no more than about 16%.

If there is anything which makes your code slow, this is this line:

int pos = UNICODE.indexOf(c);

and it has nothing to do with the GC. This line makes a linear search in the UNICODE string (I assume this is a String instance), and it is likely to be computationally expensive (I suppose that the said string is somewhat big).

I suggest you try to replace that line with:

int pos = (c <= 126) ? -1 : UNICODE.indexOf(c);

which should avoid scanning the whole string for each ASCII character (I assume that most of the input characters have no accent to remove).

For a more complete treatment of accent removal, use a java.text.Normalizer (with the NFKD form) then, for each resulting code point, get its category (with Character.getType()) and throw away all code points which have a category of COMBINING_SPACING_MARK, ENCLOSING_MARK and NON_SPACING_MARK. This would process the whole of Unicode in all its glory, but would probably be more expensive.

Thomas Pornin
Thanks for the idea, I can't use the Normalizer as Android is based on Java 1.5.
Pentium10