tags:

views:

253

answers:

8

Hello, I would like to count the occurrences of a character in a string, suppose I have the string "aaaab", how would i count the amount of a's in it?

+2  A: 

A simple loop over the characters would do it.

public int countChars(char c, String s) {
  int result = 0;
  for (int i = 0, n = s.length(); i < n; i++) {
    if (s.charAt(i) == c) {
      result++;
    }
  }
  return result;
}
dty
FYI: any decent JRE's JIT will move the `i < s.length()` from `for (int i = 0; i < s.length(); i++)` for you: there is often no need to make code harder to read by such "optimizations". Here's a nice article about "clever" programming tricks: [Write Dumb Code -- Advice From Four Leading Java Developers](http://java.sun.com/developer/technicalArticles/Interviews/devinsight_1/)
Bart Kiers
As a pattern, this prevents you from thinking about whether the limiting expression is something that the compiler can optimise/is constant. For example, writing it this way, saves me from having to think about whether `for (int i = 0; i < expensiveCalculation(); i++)...` really is expensive and/or constant and/or can be hoisted out of the loop.
dty
Although I agree in this simple case, there is no need for it.
dty
+3  A: 

The code looks way easier to read if you don't use regular expressions.

int count = 0;
for(int i =0; i < string.length(); i++)
    if(string.charAt(i) == 'a')
        count++;

count now contains the number of 'a's in your string. And, this performs in optimal time.

Regular expressions are nice for pattern matching. But just a regular loop will get the job done here.

jjnguy
@Justin 'jjnguy' Nelson: your (accepted) answer only works if you plan on counting Java char. It doesn't work for all the Unicode characters that a Java String can contain. String's *codePointAt(...)* is the method you're looking for, not *charAt(...)*, which is broken since Unicode 3.1 came out.
Webinator
@Web could you please point me to a reference? I'd be interested in learning more.
jjnguy
@Justin 'jjnguy' Nelson: I think the JavaDoc are exhaustive (not sure that said). Basically, *charAt* returns 16 bits value and since Unicode 3.1 / Java 1.5 there are more than 65536 characters supported by Unicode (and Java). Hence *charAt* can return "something" that is not a Unicode character. The newer *codePointAt* returns a 32 bit value and can hence contain all valid Unicode characters.
Webinator
@Web, ok. That makes sense. I though 16bits was enough... I will leave my answer the way it is though. Adding that unfamiliar method would not be helpful to people new to the language. And, your comment right below serves to point out the flaw in the code.
jjnguy
+3  A: 

Try using Apache Commons' StringUtils:

int count = StringUtils.countMatches("aaaab", "a");
// count = 4 
MikeG
Note that StringUtils will find occurrences of a String within another String, so might not be as efficient as using a character-specific search.
dty
+1 for brevity and readability
Mark Thomas
+2  A: 
int count = 0;
for (char c : string.toCharArray()) 
    if (c == 'a')
        count++;
Aillyn
Nice and succinct! But generates unnecessary garbage.
dty
What is "unnecessary garbage"?
Bart Kiers
Converting the String to a char[] will allocate a new char[] which will be discarded as soon as the loop is finished.
dty
@dty But the GC will take care of it. Unless your string is huge, I don't think this is a big deal.
Aillyn
On the contrary. For my day job, I work on ultra-low latency systems, and we have to be completely anal about the amount of garbage we generate in order to get maximum performance.
dty
@dty, ultra-low latency systems in Java, I see. Anyway, it's getting *waaaay* past my bedtime and now is a good time to leave, I guess :)
Bart Kiers
I continue to be floored at the number of people who write on SO about using Java for low latency systems. Its like using Assembler for cross platform development - sometimes you are just using the wrong tool for the job.
Yishai
Wrong tool how? I can't go into specifics, but we can get tens of thousands of messages from our border, through our proprietary reliable middleware and several server hops, and back out to the border with single millisecond latencies, consistently and without significant latency spikes, using commodity hardware and a SINGLE THREADED architecture which includes complete hot/hot failover and journaling. How exactly is that the wrong tool?
dty
I continue to be floored by people who say Java can't be used to write high performance systems just because they aren't capable of writing high performance code! :-)
dty
@dty, you're right: Java can surely be used for high performance systems. But in this case (counting the occurrences of characters in a string) talking about the fact that `toCharArray()` is inefficient makes little sense to me. If the OP had mentioned s/he was operating on extremely large strings, then I would understand it, but not now. It is obvious that this is just an exercise.
Bart Kiers
Absolutely. That's why I made a positive comment about the answer, and didn't down-rank it! :-)
dty
A: 

For your String s and character c, try this:

int occurences = 0;
int index = s.indexOf(c, 0);
while (index != -1) {
    occurences++;
    index = s.indexOf(c, index);
}
eumiro
This is an infinite loop if s contains c.
dty
This solution is kinda neat.
jjnguy
@dty You're right. Thank you. Fixed.
eumiro
A: 

Guava's CharMatcher API is quite powerful and concise:

CharMatcher.is('a').countIn("aaaab"); //returns 4
dogbane
A: 

Here is a really short solution without any extra libraries:

String input = "aaaab";

int i = -1, count = 0;
while( (i = input.indexOf( 'a', i + 1 ) ) != -1 ) count++;

System.out.println( count );
tangens
+1  A: 

Regular expressions aren't particularly good at counting simple things. Think ant+sledgehammer. They are good at busting complex strings up into pieces.

Anyway, here's one solution the OP is interested in - using a Regexp to count a's:

public class Reggie {
    public static void main(String[] args) {
        Pattern pattern = Pattern.compile("[^a]*a");
        Matcher matcher = pattern.matcher("aaabbbaaabbabababaaabbbbba");
        int count =  0;
        while(matcher.find()) {
            count++;
        }
        System.out.println(count+" matches");
    }
}

This is a pretty slow way to do it, as pointed out by others. Worse, it isn't the easiest and certainly isn't the most likely to be bug-free. Be that as it may, if you wanted something a little more complex than 'a' then the regexp would become more appropriate as the requested string got more complex. For example, if you wanted to pick dollar amounts out of a long string then a regexp could be the best answer.

Now, about the regexp: [^a]*a

This [^a]* means 'match zero or more non-'a' characters. This allows us to devour non-a crud form the beginning of a string: If the input is 'bbba' then [^a]* will match 'bbb'. It doesn't match the 'a'. Not to worry, the trailing 'a' in the Regexp says, "match exactly one a'. So our regexp says, "match zero or more non-a characters that are followed by an a."

Ok. Now you can read about Pattern and Matcher. The nutshell is that the Pattern is a compiled (read: efficient) regular expression. It is expensive to compile a Regexp so I make mine static so they only get compiled once. The Matcher is a class that will apply a string to a Pattern to see if it matches. Matcher has state information that lets it crawl down a string applying a Pattern repeatedly.

The loop basically says, "matcher, crawl down the string finding me the next occurrence of the pattern. If we find it, increment the counter." Note the character sequences being found by Matcher isn't just 'a'. It is finding sequences like the following: a, bbba, bba, ba, etc. That is, strings that don't contain an 'a' except for their last character.

Tony Ennis