ansaurus

Question

Find occurrences of characters in a Java String

Answer 1

+2 A:

A simple loop over the characters would do it.

public int countChars(char c, String s) {
  int result = 0;
  for (int i = 0, n = s.length(); i < n; i++) {
    if (s.charAt(i) == c) {
      result++;
    }
  }
  return result;
}

dty 2010-09-21 20:02:25

FYI: any decent JRE's JIT will move the `i < s.length()` from `for (int i = 0; i < s.length(); i++)` for you: there is often no need to make code harder to read by such "optimizations". Here's a nice article about "clever" programming tricks: [Write Dumb Code -- Advice From Four Leading Java Developers](http://java.sun.com/developer/technicalArticles/Interviews/devinsight_1/)

Bart Kiers 2010-09-21 20:31:05

As a pattern, this prevents you from thinking about whether the limiting expression is something that the compiler can optimise/is constant. For example, writing it this way, saves me from having to think about whether `for (int i = 0; i < expensiveCalculation(); i++)...` really is expensive and/or constant and/or can be hoisted out of the loop.

dty 2010-09-21 20:59:01

Although I agree in this simple case, there is no need for it.

dty 2010-09-21 20:59:26

Answer 2

+3 A:

The code looks way easier to read if you don't use regular expressions.

int count = 0;
for(int i =0; i < string.length(); i++)
    if(string.charAt(i) == 'a')
        count++;

count now contains the number of 'a's in your string. And, this performs in optimal time.

Regular expressions are nice for pattern matching. But just a regular loop will get the job done here.

jjnguy 2010-09-21 20:03:01

@Justin 'jjnguy' Nelson: your (accepted) answer only works if you plan on counting Java char. It doesn't work for all the Unicode characters that a Java String can contain. String's *codePointAt(...)* is the method you're looking for, not *charAt(...)*, which is broken since Unicode 3.1 came out.

Webinator 2010-09-22 01:38:33

@Web could you please point me to a reference? I'd be interested in learning more.

jjnguy 2010-09-22 02:14:44

@Justin 'jjnguy' Nelson: I think the JavaDoc are exhaustive (not sure that said). Basically, *charAt* returns 16 bits value and since Unicode 3.1 / Java 1.5 there are more than 65536 characters supported by Unicode (and Java). Hence *charAt* can return "something" that is not a Unicode character. The newer *codePointAt* returns a 32 bit value and can hence contain all valid Unicode characters.

Webinator 2010-09-22 14:02:09

@Web, ok. That makes sense. I though 16bits was enough... I will leave my answer the way it is though. Adding that unfamiliar method would not be helpful to people new to the language. And, your comment right below serves to point out the flaw in the code.

jjnguy 2010-09-22 14:38:45

Answer 3

+3 A:

Try using Apache Commons' StringUtils:

int count = StringUtils.countMatches("aaaab", "a");
// count = 4

MikeG 2010-09-21 20:03:39

Note that StringUtils will find occurrences of a String within another String, so might not be as efficient as using a character-specific search.

dty 2010-09-21 20:05:36

+1 for brevity and readability

Mark Thomas 2010-09-22 01:05:46

Answer 4

+2 A:

int count = 0;
for (char c : string.toCharArray()) 
    if (c == 'a')
        count++;

Aillyn 2010-09-21 20:05:11

Nice and succinct! But generates unnecessary garbage.

dty 2010-09-21 20:06:56

What is "unnecessary garbage"?

Bart Kiers 2010-09-21 20:10:09

Converting the String to a char[] will allocate a new char[] which will be discarded as soon as the loop is finished.

dty 2010-09-21 20:11:31

@dty But the GC will take care of it. Unless your string is huge, I don't think this is a big deal.

Aillyn 2010-09-21 20:12:32

On the contrary. For my day job, I work on ultra-low latency systems, and we have to be completely anal about the amount of garbage we generate in order to get maximum performance.

dty 2010-09-21 20:17:06

@dty, ultra-low latency systems in Java, I see. Anyway, it's getting *waaaay* past my bedtime and now is a good time to leave, I guess :)

Bart Kiers 2010-09-21 20:20:58

I continue to be floored at the number of people who write on SO about using Java for low latency systems. Its like using Assembler for cross platform development - sometimes you are just using the wrong tool for the job.

Yishai 2010-09-21 20:23:58

Wrong tool how? I can't go into specifics, but we can get tens of thousands of messages from our border, through our proprietary reliable middleware and several server hops, and back out to the border with single millisecond latencies, consistently and without significant latency spikes, using commodity hardware and a SINGLE THREADED architecture which includes complete hot/hot failover and journaling. How exactly is that the wrong tool?

dty 2010-09-21 21:02:48

I continue to be floored by people who say Java can't be used to write high performance systems just because they aren't capable of writing high performance code! :-)

dty 2010-09-21 21:03:29

@dty, you're right: Java can surely be used for high performance systems. But in this case (counting the occurrences of characters in a string) talking about the fact that `toCharArray()` is inefficient makes little sense to me. If the OP had mentioned s/he was operating on extremely large strings, then I would understand it, but not now. It is obvious that this is just an exercise.

Bart Kiers 2010-09-22 06:47:39

Absolutely. That's why I made a positive comment about the answer, and didn't down-rank it! :-)

dty 2010-09-22 11:31:49

Answer 5

A:

For your String s and character c, try this:

int occurences = 0;
int index = s.indexOf(c, 0);
while (index != -1) {
    occurences++;
    index = s.indexOf(c, index);
}

eumiro 2010-09-21 20:06:43

This is an infinite loop if s contains c.

dty 2010-09-21 20:07:31

This solution is kinda neat.

jjnguy 2010-09-21 20:07:34

@dty You're right. Thank you. Fixed.

eumiro 2010-09-21 20:16:39

Answer 6

A:

Guava's CharMatcher API is quite powerful and concise:

CharMatcher.is('a').countIn("aaaab"); //returns 4

dogbane 2010-09-21 20:23:29

Answer 7

A:

Here is a really short solution without any extra libraries:

String input = "aaaab";

int i = -1, count = 0;
while( (i = input.indexOf( 'a', i + 1 ) ) != -1 ) count++;

System.out.println( count );

tangens 2010-09-21 21:15:31

Answer 8

+1 A:

Regular expressions aren't particularly good at counting simple things. Think ant+sledgehammer. They are good at busting complex strings up into pieces.

Anyway, here's one solution the OP is interested in - using a Regexp to count a's:

public class Reggie {
    public static void main(String[] args) {
        Pattern pattern = Pattern.compile("[^a]*a");
        Matcher matcher = pattern.matcher("aaabbbaaabbabababaaabbbbba");
        int count =  0;
        while(matcher.find()) {
            count++;
        }
        System.out.println(count+" matches");
    }
}

This is a pretty slow way to do it, as pointed out by others. Worse, it isn't the easiest and certainly isn't the most likely to be bug-free. Be that as it may, if you wanted something a little more complex than 'a' then the regexp would become more appropriate as the requested string got more complex. For example, if you wanted to pick dollar amounts out of a long string then a regexp could be the best answer.

Now, about the regexp: [^a]*a

This [^a]* means 'match zero or more non-'a' characters. This allows us to devour non-a crud form the beginning of a string: If the input is 'bbba' then [^a]* will match 'bbb'. It doesn't match the 'a'. Not to worry, the trailing 'a' in the Regexp says, "match exactly one a'. So our regexp says, "match zero or more non-a characters that are followed by an a."

Ok. Now you can read about Pattern and Matcher. The nutshell is that the Pattern is a compiled (read: efficient) regular expression. It is expensive to compile a Regexp so I make mine static so they only get compiled once. The Matcher is a class that will apply a string to a Pattern to see if it matches. Matcher has state information that lets it crawl down a string applying a Pattern repeatedly.

The loop basically says, "matcher, crawl down the string finding me the next occurrence of the pattern. If we find it, increment the counter." Note the character sequences being found by Matcher isn't just 'a'. It is finding sequences like the following: a, bbba, bba, ba, etc. That is, strings that don't contain an 'a' except for their last character.

Tony Ennis 2010-09-21 21:34:05

ansaurus

tags:

views:

answers:

Find occurrences of characters in a Java String

related questions