tags:

views:

294

answers:

7

I am writing some java code where I branch off based on whether a string starts with certain characters while looping through a dataset and my dataset is expected to be large.

I was wondering whether startsWith is faster than indexOf. I did experiment with 2000 records and did find any difference.

+2  A: 

Probably, if it doesn't match it can stop looking whereas indexOf needs to look for occurrences later in the string.

Fredrik
+2  A: 

Even without looking into the sources, it should be clear that startsWith is faster at least for large strings and short pattern:

The running time of a.startsWith(b) is bound be the length of b. After at most the first b characters are checked, the search finished.

The running time of a.indexOf(b) is larger (depending on the actual algorithm). Every algorithm has at least a running time depending on the length of a. Roughly, you can say, that you have to look at each character once to check if the pattern starts at that position.

However, as always, it depends on the actual use case if you really see a difference in practice. Measuring the difference in real life is never bad.

dmeister
I think your first sentence is backwards... *startsWith* is faster at least for large strings and short pattern...
Jon Skeet
Jon Skeet is right. I changed it.
dmeister
+5  A: 

In general, the golden rule of micro-optimization applies here:

"Measure, don't guess".

As with all optimizations of this type, the difference between the two calls almost certainly won't matter unless you are checking millions of strings that are each tens of thousands of characters long.

Run a profiler over your code, and only optimize this call when you can measure that it's slowing you down. Till then, go with the more readable options (startsWith, in this case). Once you know that this block is slowing you down, try both and use whichever is faster. Rinse. Repeat ;-)

Academically, my guess is that startsWith will likely be implemented using indexOf. Check the source code and see if you're interested. (Turns out that startsWith does not call indexOf)

Sean Reilly
I really, *really* hope that startsWith isn't implemented with indexOf. Consider "some very long string which doesn't start with x".startsWith("x") - an efficient implementation should come back after checking the first character, whereas using indexOf potentially needs to look through the whole string.
Jon Skeet
That's a good point. I'm curious now, so I'm going to check.
Sean Reilly
I checked, and startsWith doesn't call indexOf. Crisis abated.
Sean Reilly
I mostly agree but I've seen several developers who use indexOf() == 0 to find out if a string starts with "GET " (or something). Probably because they have never given it a thought and it is easy to use the same old hammer no matter what nail you're driving, maybe they have even compared them once (on a small string) and figured out that they were more or less indentical, not understanding why the difference is huge when they are passed 10k block of data.Measuring is a good start but it is even more important to understand why you see the figures you get from measuring.
Fredrik
+2  A: 

startsWith only needs to check for the presence at the very start of the string - it's doing less work, so it should be faster.

My guess is that your 2000 records finished in a few milliseconds (if that). Whenever you want to benchmark one approach against another, try to do it for enough time that differences in timing will be significant. I find that 10-30 seconds is long enough to show significant improvements, but short enough to make it bearable to run the tests multiple times. (If this were a serious investigation I'd probably try for longer times. Most of my benchmarking is for fun.)

Also make sure you've got varied data - indexOf and startsWith should have roughly the same running time in the case where indexOf returns 0. So if all your records match the pattern, you're not really testing correctly. (I don't know whether that was the case in your tests of course - it's just something to watch out for.)

Jon Skeet
A: 

You mentioned the dataset is expected to be large. So i will bet a lot of performanve will go into access this dataset and handle it in memory. That means use one or the other will not change the perfomance significant. But if this is important to you you may write your own startWith method that could be significant faster than standard library methods or at least you know exactly what is done.

PeterMmm
A: 

public class Test { public static void main(String args[]) {

long value1 = System.currentTimeMillis();
for(long i=0;i<100000000;i++)
{
 "abcd".indexOf("a");
}
long value2 = System.currentTimeMillis();
System.out.println(value2-value1);


value1 = System.currentTimeMillis();
for(long i=0;i<100000000;i++)
{
 "abcd".startsWith("a");
}
value2 = System.currentTimeMillis();
System.out.println(value2-value1);

} }

Tested it with this piece of code and perf for startsWith seems to be better, for obvious reason that it doesn't have to traverse through string. But in best case scenario both should perform close while in a worst case scenario startsWith will always perform better than indexOf

Priyank
But please note that though startsWith can always be substituted with indexOf. It's not possible to substitute indexOf with startsWith. So in a way comparison is between oranges and apples; though for a specific case where you may want to check if the string starts with some specific character set, you might be able to use both; and yet startsWith is recommended.
Priyank
"indexOf and startsWith should have roughly the same running time in the case where indexOf returns 0." This test has the problem Jon was concerned about -- it might not be a great benchmark.
Sean Reilly
indexOf is faster if the "input" has more than one letter. startsWith is quicker for a single character. The longer the word, indexOf performs better. Try doing a loop with 100 iterations and use System.nanoTime() instead. But the results are so minimum you shouldn't worry which one to use.
Mohamed Mansour
+1  A: 

startsWith is clearer than indexOf == 0.

Have you identified the test as a performance bottleneck for which you need to sacrifice readability?

Steve-o