views:

478

answers:

5

I want to go through each character in a String and pass each character of the String as a String to another function.

String s = "abcdefg";
for(int i = 0; i < s.length(); i++){
    newFunction(s.substring(i, i+1));}

or

String s = "abcdefg";
for(int i = 0; i < s.length(); i++){
    newFunction(Character.toString(s.charAt(i)));}

The final result needs to be a String. So any idea which will be faster or more efficient?

+4  A: 

Does newFunction really need to take a String? It would be better if you could make newFunction take a char and call it like this:

newFunction(s.charAt(i));

That way, you avoid creating a temporary String object.

To answer your question: It's hard to say which one is more efficient. In both examples, a String object has to be created which contains only one character. Which is more efficient depends on how exactly String.substring(...) and Character.toString(...) are implemented on your particular Java implementation. The only way to find it out is running your program through a profiler and seeing which version uses more CPU and/or more memory. Normally, you shouldn't worry about micro-optimizations like this - only spend time on this when you've discovered that this is the cause of a performance and/or memory problem.

Jesper
newFunction really needs to take a string. Apart from single characters, newFunction also handles longer strings as well. And it handles them the same way. I don't want to overload newFunction to take in a char because it does the same thing in both cases.
estacado
I agree completely that micro-optimisation should be avoided in development until it is found to be necessary. I also think that, as a learning excercise, learning about memory allocations and other 'hidden behaviour' is very important. I'm personally tired of niaive programmers knocking out short code in the belief that short = performant, and unwittingly using highly inefficient algorithms. People who don't learn this = lazy. People who are fixated by this = slow. There's a balance to be struck. In my opinion :)
Dems
@estacado: If performance is you driver (as implied by your post) optimise in the right places. Overloading the new function to avoid String overheads -may- be the sensible option depending on what the [char] based version would look like. Contorting your code around the function may be more timeconsuming, less effective, and less maintainable.
Dems
+11  A: 

the answer is: it doesn't matter.

Profile your code. Is this your bottleneck?

Will
A: 

I would first obtain the underlying char[] from the source String using String.toCharArray() and then proceed to call newFunction.

But I do agree with Jesper that it would be best if you could just deal with characters and avoid all the String functions...

String.charAt(i) does that lookup as far as I am aware. Copying the string to a new array (which is what I understand String.toCharArray() to do) introduces a new and different overhead. Is repeatedly passing a string reference to charAt() slower than converting to a native array first? I suspect it depends on the length of the string...
Dems
There are always trade-offs :) Only the OP can really tell what is more efficient.
+2  A: 

Of the two snippets you've posted, I wouldn't want to say. I'd agree with Will that it almost certainly is irrelevant in the overall performance of your code - and if it's not, you can just make the change and determine for yourself which is fastest for your data with your JVM on your hardware.

That said, it's likely that the second snippet would be better if you converted the String into a char array first, and then performed your iterations over the array. Doing it this way would perform the String overhead once only (converting to the array) instead of every call. Additionally, you could then pass the array directly to the String constructor with some indices, which is more efficient than taking a char out of an array to pass it individually (which then gets turned into a one character array):

String s = "abcdefg";
char[] chars = s.toCharArray();
for(int i = 0; i < chars.length; i++) {
    newFunction(String.valueOf(chars, i, 1));
}

But to reinforce my first point, when you look at what you're actually avoiding on each call of String.charAt() - it's two bounds checks, a (lazy) boolean OR, and an addition. This is not going to make any noticeable difference. Neither is the difference in the String constructors.

Essentially, both idioms are fine in terms of performance (neither is immediately obviously inefficient) so you should not spend any more time working on them unless a profiler shows that this takes up a large amount of your application's runtime. And even then you could almost certainly get more performance gains by restructuring your supporting code in this area (e.g. have newFunction take the whole string itself); java.lang.String is pretty well optimised by this point.

Andrzej Doyle
`substring` in the current jvm actually uses the original character array as a backing store, while you're initiating a copy. So my gut feeling says substring will actually be faster, as a memcpy will likely be more expensive (depending on how large the string is, larger is better).
wds
+3  A: 

As usual: it doesn't matter but if you insist on spending time on micro-optimization or if you really like to optimize for your very special use case, try this:

import org.junit.Assert;
import org.junit.Test;

public class StringCharTest {

    // Times:
    // 1. Initialization of "s" outside the loop
    // 2. Init of "s" inside the loop
    // 3. newFunction() actually checks the string length,
    // so the function will not be optimized away by the hotstop compiler

    @Test
    // Fastest: 237ms / 562ms / 2434ms
    public void testCacheStrings() throws Exception {
        // Cache all possible Char strings
        String[] char2string = new String[Character.MAX_VALUE];
        for (char i = Character.MIN_VALUE; i < Character.MAX_VALUE; i++) {
            char2string[i] = Character.toString(i);
        }

        for (int x = 0; x < 10000000; x++) {
            char[] s = "abcdefg".toCharArray();
            for (int i = 0; i < s.length; i++) {
                newFunction(char2string[s[i]]);
            }
        }
    }

    @Test
    // Fast: 1687ms / 1725ms / 3382ms
    public void testCharToString() throws Exception {
        for (int x = 0; x < 10000000; x++) {
            String s = "abcdefg";
            for (int i = 0; i < s.length(); i++) {
                // Fast: Creates new String objects, but does not copy an array
                newFunction(Character.toString(s.charAt(i)));
            }
        }
    }

    @Test
    // Very fast: 1331 ms/ 1414ms / 3190ms
    public void testSubstring() throws Exception {
        for (int x = 0; x < 10000000; x++) {
            String s = "abcdefg";
            for (int i = 0; i < s.length(); i++) {
                // The fastest! Reuses the internal char array
                newFunction(s.substring(i, i + 1));
            }
        }
    }

    @Test
    // Slowest: 2525ms / 2961ms / 4703ms
    public void testNewString() throws Exception {
        char[] value = new char[1];
        for (int x = 0; x < 10000000; x++) {
            char[] s = "abcdefg".toCharArray();
            for (int i = 0; i < s.length; i++) {
                value[0] = s[i];
                // Slow! Copies the array
                newFunction(new String(value));
            }
        }
    }

    private void newFunction(String string) {
        // Do something with the one-character string
        Assert.assertEquals(1, string.length());
    }

}
mhaller
As this wil be passed a string you need to change your testing slightly in the first test. {char[] s = "abcdefg".toCharArray();} should be Inside the loop, or even better (to prevent clever optimisation by the JVM, put the whole loop and the .toCharArray(), inside a seperate function). It's important to measure all the initial overheads as well as the loops costs. Especially as performance could realistically tip from one to the other based on string length. So testing various lengths of stings is also important.
Dems
+1 for actually answering the question.
gustafc
Moved "s" inside the loop and added an assert() to prevent JVM optimization of newFunction(). Of course it's slower now, but the relative measurements still are the same. My point is merely that there are possibilities for optimization if the problem is known exactly. The point is not to change which function to use for a certain operation, but to see the operation on a higher level to gain improvements, e.g. by caching
mhaller