views:

148

answers:

5

In a program I am writing I am doing a lot of string manipulation. I am trying to increase performance and am wondering if using char arrays would show a decent performance increase. Any suggestions?

+7  A: 

What kind of manipulation are you doing? Can you post a code sample?

You may want to take a look at StringBuilder which implements CharSequence to improve performance. I'm not sure you want to roll your own. StringBuilder isn't thread safe btw... if you want thread safety look at StringBuffer.

Jon
If you need thread-safety, there's a non-trivial chance you'll have to do more than just drop in a `StringBuffer`. You might avoid deadlocks and race conditions, but the results probably won't match what you were expecting.
Hank Gay
Thank you I will re-implement and then post my results.
ThePinkPoo
@Hank: With a non-trivial update, you wrap your own `synchronized(thebuffer){...}` round it, but you don't need that sort of thing too often. Indeed, that's why `StringBuilder` was introduced; to get rid of the cost of holding locks at all when it's not needed (i.e., almost all the time).
Donal Fellows
+2  A: 

String is already implemented as a char array. What are you planning to do differently? Anyway, between that and the fact that GC for ephemeral objects is extremely fast I would be amazed if you could find a way to increase performance by substituting char arrays.

Michael Borgwardt's advice about small char arrays and using StringBuilder and StringBuffer is very good. But to me the main thing is to try not to guess about what's slow: make measurements, use a profiler, get some definite facts. Because usually our guesses about performance turn out to be wrong.

Nathan Hughes
+1 ha ha.. well said..
Bragboy
+1  A: 

When you have a very large number of short Strings, using char[] instead can save quite a bit of memory, which also means more speed due to less cache misses.

But with large Strings, the main thing to look out for is avoiding unnecessary copying resulting fom the immutability of String. If you do a lot of concatenating or replacing, using StringBuilder can make a big difference.

Michael Borgwardt
Michael, could you elaborate a bit more on replacing Strings with char[]? Char[] will take slightly less space than a String instance, however char[] doesn't get internalised and for many short Strings the probability that some of the strings are the same and going to be internalised (i.e. JVM will keep a single copy) is much higher than for a few long strings.
Totophil
@Totophil: It really depends on what kind of Strings you work with and what you do with them; If you use mutable representations, interning becomes irrelevant.
Michael Borgwardt
Michael, agree, it really depends on the specifics of the scenario. And the only scenario that comes to my mind is when the software needs to do a lot of string manipulations "in place". But the approach won't be of any help with tackling string overheads coming from concatenation, searches and comparisons.
Totophil
+1  A: 

Here is an excerpt from the full source of String class from JDK 6.0:

 public final class String implements  java.io.Serializable,
       Comparable<String>, CharSequence {
       /** The value is used for character storage. */
        private final char value[];

       /** The offset is the first index of the storage that is used. */
       private final int offset;

        /** The count is the number of characters in the String. */
       private final int count;

As you can see internally the value is already stored as an array of chars. An array of chars as a data structure has all the limitations of the String class for most string manipulations: Java arrays do not grow, i.e. every time (ok, may be not every single time) your string would need to grow you'd need to allocate a new array and copy the contents.

As suggested earlier it makes sense to use StringBuilder or StringBuffer for most string manipulations.

In fact the following code:

   String a = "a";
   a=a+"b";
   a=a+"c";

When compiled will be automatically converted to use StringBuilder, this can be easily checked with the help of javap.

As a rule of thumb it's rarely advisable to spend time trying to improve performance of the core Java classes, unless you're a world class expert on the matter, simply because this code was written by the world class experts in the first place.

Totophil
+2  A: 

Have you profiled your application? Do you know where the bottlenecks are? That is the first step if the performance is sub par. Well, that and defining what acceptable performance metrics are.

Once you have profiled doing some tasks, you will have percentages of time spent doing things. If you are spending a lot of time manipulating Strings, maybe you can start to cache some of those manipulations? Are you doing some of them repeatedly when doing them only once would suffice (and then use that result again later when it is needed)? Are you copying Strings when you don't need to? Remember, java.lang.String is immutable - so it cannot be changed directly.

I have found several times while optimizing/performance tweaking systems I work on that I do not know where the slowness comes from instinctively. I have seen others (and, shamefully, myself) spend days optimizing something that shows no gain - because it was not the original bottleneck, and was in fact less than 1% of the time spent.

Hope this helps point you in the right direction.

aperkins
I have profiled and it wasn't too informative since my complexity is pretty minimal. I do know from the profile that string methods were killing it, also my loops within the code. So I am going to unroll some of the loops and use StringBuilder
ThePinkPoo
@ThePinkPoo: If String operations are killing it, then the best thing to do is try and reduce the number of String operations you are doing. This can be done through caching, or similar behavior. Sorry for assuming you didn't profile - I often see that on various forum style pages (here included), and wanted to make sure you were doing it. :) Good luck.
aperkins