ansaurus

Question

What is the easiest/best/most correct way to iterate through the characters of a string in Java?

Answer 1

+6 A:

I use a for loop. And use charAt(). Since the String is implemented with an array, the charAt() method is a constant time operation.

String s = "...stuff...";

for (int i = 0; i < s.length(); i++){
    char c = s.charAt(i);        
    //Process char
}

That's what I would do. It seems the easiest to me.

As far as correctness goes, I don't believe that exists here. It is all based on your personal style.

jjnguy 2008-10-13 06:13:16

Does the compiler inline the length() method?

Uri 2008-10-13 06:25:46

I dunno. I usually don't optimize my code. But it can't hurt to pull the length into a variable and use that instead. My guess is that the compiler in-lines the call though.

jjnguy 2008-10-13 06:28:26

@Uri, the Java compiler does not do optimization. For HotSpot *the JVM* will inline it pretty soon at runtime. There are other JVM implementations (i.e. some of the J2ME VMs used in phones) that do not do runtime optimizations.

ddimitrov 2008-10-13 06:50:06

it might inline length(), that is hoist the method behind that call up a few frames, but its more efficient to do thisfor(int i = 0, n = s.length() ; i < n ; i++) { char c = s.charAt(i);}

Dave Cheney 2008-10-13 08:04:39

Cluttering your code for a *tiny* performance gain. Please avoid this until you decide this area of code is speed-critical.

slim 2008-10-13 08:13:44

I usually don't optimize my code unless readability isn't sacrificed.

jjnguy 2008-10-13 14:18:27

Answer 2

+1 A:

I wouldn't use StringTokenizer as it is one of classes in the JDK that's legacy.

The javadoc says:

StringTokenizer is a legacy class that is retained for compatibility reasons although its use is discouraged in new code. It is recommended that anyone seeking this functionality use the split method of String or the java.util.regex package instead.

Alan 2008-10-13 06:26:23

String tokenizer is perfectly valid (and more efficient) way for iterating over tokens (i.e. words in a sentence.) It is definitely an overkill for iterating over chars. I am downvoting your comment as misleading.

ddimitrov 2008-10-13 06:56:17

ddimitrov: I'm not following how pointing out that StringTokenizer is not recommended INCLUDING a quotation from the JavaDoc (http://java.sun.com/javase/6/docs/api/java/util/StringTokenizer.html) for it stating as such is misleading. Upvoted to offset.

R. Bemrose 2008-10-13 14:44:30

Thanks Mr. Bemrose ... I take it that the cited block quote should have been crystal clear, where one should probably infer that active bug fixes won't be commited to StringTokenizer.

Alan 2008-10-13 22:23:53

Answer 3

A:

See The Java Tutorials: Strings.

public class StringDemo {
 public static void main(String[] args) {
  String palindrome = "Dot saw I was Tod";
  int len = palindrome.length();
  char[] tempCharArray = new char[len];
  char[] charArray = new char[len];

  // put original string in an array of chars
  for (int i = 0; i < len; i++) {
   tempCharArray[i] = palindrome.charAt(i);
  } 

  // reverse array of chars
  for (int j = 0; j < len; j++) {
   charArray[j] = tempCharArray[len - 1 - j];
  }

  String reversePalindrome =  new String(charArray);
  System.out.println(reversePalindrome);
 }
}

Put the length into int len and use for loop.

eed3si9n 2008-10-13 06:34:57

Answer 4

A:

Interestingly enough the easiest, best, and most correct implementations are often mutually exclusive.

Greg Dean 2008-10-13 06:35:33

Answer 5

+4 A:

There are some dedicated classes for this:

import java.text.*;

final CharacterIterator it = new StringCharacterIterator(s);
for(char c = it.first(); c != CharacterIterator.DONE; c = it.next()) {
   // process c
   ...
}

Bruno De Fraine 2008-10-13 06:38:20

Looks like an overkill for something as simple as iterating over immutable char array.

ddimitrov 2008-10-13 06:58:43

I don't see why this is overkill. Iterators are the most java-ish way to do anything... iterative. The StringCharacterIterator is bound to take full advantage of immutability.

slim 2008-10-13 08:11:22

If I were using an iterator I would have used a foreach loop then.

jjnguy 2008-10-13 15:57:04

@jjnguy: foreach is only possible for java.lang.Iterable's

Bruno De Fraine 2008-10-14 08:00:00

Agree with @ddimitrov - this is overkill. The only reason to use an iterator would be to take advantage of foreach, which is a bit easier to "see" than a for loop. If you're going to write a conventional for loop anyway, then might as well use charAt()

raimesh 2010-02-04 08:39:12

Answer 6

+8 A:

Two options

for(int i = 0, n = s.length() ; i < n ; i++) { 
    char c = s.charAt(i); 
}

or

for(char c : s.toCharArray()) {
    // process c
}

The first is probably faster, then 2nd is probably more readable.

Dave Cheney 2008-10-13 08:06:23

Can you make you code more like actual code? For instance, s.toCharArray() instead of s.toCharArray. Further more, your first implementation seems to miss a few characters/lines

Roel Spilker 2008-10-13 08:47:56

yeah - whoa, what happened to my post, thats all messed up

Dave Cheney 2008-10-13 10:03:58

Answer 7

A:

StringTokenizer is totally unsuited to the task of breaking a string into its individual characters. With String#split() you can do that easily by using a regex that matches nothing, e.g.:

String[] theChars = str.split("|");

But StringTokenizer doesn't use regexes, and there's no delimiter string you can specify that will match the nothing between characters. There is one cute little hack you can use to accomplish the same thing: use the string itself as the delimiter string (making every character in it a delimiter) and have it return the delimiters:

StringTokenizer st = new StringTokenizer(str, str, true);

However, I only mention these options for the purpose of dismissing them. Both techniques break the original string into one-character strings instead of char primitives, and both involve a great deal of overhead in the form of object creation and string manipulation. Compare that to calling charAt() in a for loop, which incurs virtually no overhead.

Alan Moore 2008-10-13 12:24:48

Answer 8

+1 A:

I agree that StringTokenizer is overkill here. Actually I tried out the suggestions above and took the time.

My test was fairly simple: create a StringBuilder with about a million characters, convert it to a String, and traverse each of them with charAt() / after converting to a char array / with a CharacterIterator a thousand times (of course making sure to do something on the string so the compiler can't optimize away the whole loop :-) ).

The result on my 2.6 GHz Powerbook (that's a mac :-) ) and JDK 1.5:

Test 1: charAt + String --> 3138msec
Test 2: String converted to array --> 9568msec
Test 3: StringBuilder charAt --> 3536msec
Test 4: CharacterIterator and String --> 12151msec

As the results are significantly different, the most straightforward way also seems to be the fastest one. Interestingly, charAt() of a StringBuilder seems to be slightly slower than the one of String.

BTW I suggest not to use CharacterIterator as I consider its abuse of the '\uFFFF' character as "end of iteration" a really awful hack. In big projects there's always two guys that use the same kind of hack for two different purposes and the code crashes really mysteriously.

Here's one of the tests:

 int count = 1000;
 ...

 System.out.println("Test 1: charAt + String");
 long t = System.currentTimeMillis();
 int sum=0;
 for (int i=0; i<count; i++) {
  int len = str.length();
  for (int j=0; j<len; j++) {
   if (str.charAt(j) == 'b')
    sum = sum + 1;
  }
 }
 t = System.currentTimeMillis()-t;
 System.out.println("result: "+ sum + " after " + t + "msec");

2008-12-11 21:08:23

Answer 9

+5 A:

Note most of the other techniques described here break down if you're dealing with characters outside of the BMP (Unicode Basic Multilingual Plane), i.e. code points that are outside of the u0000-uFFFF range. This will only happen rarely, since the code points outside this are mostly assigned to dead languages. But there are some useful characters outside this, for example some code points used for mathematical notation, and some used to encode proper names in Chinese.

In that case your code will be:

String str = "....";
int offset = 0, strLen = str.length();
while (offset < strLen) {
  int curChar = str.codePointAt(offset);
  offset += Character.charCount(codePoint);
  // do something with curChar
}

The Character.charCount(int) method requires Java 5+.

Source: http://mindprod.com/jgloss/codepoint.html

sk 2008-12-11 23:04:09

ansaurus

tags:

views:

answers:

What is the easiest/best/most correct way to iterate through the characters of a string in Java?

related questions