views:

150

answers:

5
+1  Q: 

Count words, java

Hi,

I want to count words. I use the methods hasNextChar and getChar. The sentence may contain all kind of chars. Here's my code:

        boolean isWord = false;

        while(hasNextChar()){
            char current = getChar();   
            switch(current){
                case ' ' : case '.' : case ',' : case '-' :
                    isWord = false;
                default:
                    if(!isWord) wordCount++;
                    isWord = true;
            }
        }

It works so far but e.g. when I have a " . " at the end it gives me 8 instead of 7 words. Here are some examples of sentences:

*„Schreiben Sie ein Praktikanten-Vermittlungs-Programm“ – words: 6

„Du magst ja recht haben – aber ich sehe das ganz anders.“ – words: 11

„Hallo Welt !!!!“ – words: 2

„Zwei Wörter !!!!“ – words: 2

„Eins,Zwei oder Drei“ – words: 4*

A sentence does not have to end with a " . ".

Any ideas how to solve that?

+3  A: 

Since it's homework I won't solve it for you but point you in the right direction instead.

Take a look at the Character class and the helper methods it defines. (Hint: they are all called isXyz())

Reference:


For the heck of it: here's a oneliner method to count the words using Regex. Don't use this solution, come up with your own. This is probably not what your teachers want to see, anyway.

Method:

public static int countwords(final String phrase) {
    return phrase.replaceAll("[^\\p{Alpha}]+", " ").trim().split(" ").length;
}

Test code:

System.out.println(countwords(
        "Schreiben Sie ein Praktikanten-Vermittlungs-Programm"));
System.out.println(countwords(
        "Du magst ja recht haben – aber ich sehe das ganz anders."));
System.out.println(countwords("Hallo Welt !!!!"));
System.out.println(countwords("Zwei Wörter !!!!"));
System.out.println(countwords("Eins,Zwei oder Drei"));

Output:

6
11
2
3
4

Explanation: To use a phrase coined by Henry Rollins: Let's milk it, shall we?

// replace any occurrences of non-alphabetic characters with a single space
// this pattern understands unicode, so e.g. German Umlauts count as alphabetic
phrase.replaceAll("[^\\p{Alpha}]+", " ")

// trim space off beginning and end
.trim()

// split the string, using the spaces as delimiter
.split(" ")

// the length of the resulting array is the number of words
.length;
seanizer
+6  A: 

You forgot the break statement in the first case (after isWord = false).

larsmans
A: 

Let's walk through a little example: "I am."

Iteration 1: current = 'I'; wordCount = 1; isWord = true;

Iteration 2: current = ' '; isWord = false; wordCount = 2; isWord = true;

Iteration 3: current = 'a'; isWord = true;

Iteration 4: current = 'm'; isWord = true;

Iteration 5: current = '.'; isWord = false; wordCount = 3; isWord = true;

Did you intentionally leave out the break in your switch? The logic you used seems a bit strange to me.

Michael McGowan
+1  A: 

Going off of Michael McGowan comment,

The logic seems backwards to me. Shouldn't the detection of a space or punctuation signify you found a word?

And is there any restraints on how your sentence is formed? If you had a sentence with "One,_Two,Three,Four,____Five", then the algorithm would need additional logic to handle consecutive spaces/punctuations.

Kin U.
hyphen makes your logic fail
Woot4Moo
@Woot4Moo, good catch.Why not just ignore hyphens? I am assuming the hyphen is used for words and not new line returns that break up 1 word across two sentences.
Kin U.
@Kin they fall under the umbrella of punctuation and I think that technically that a hyphenated word counts as two distinct words. I am not an English major though.
Woot4Moo
@Woot4Moo, nor am I an English major :).
Kin U.
I'm a linguistics and NLP major, and I can tell you that the definition of a word varies so widely that the OP'd better do as his teacher says :)
larsmans
+1  A: 

You can use the class StringTokenizer from java.util and this would get really easyer. As parameters for the contruction use the string you have and all the delimiters you want.

StringTokenizer s = new StringTokenizer(yourString, ",. :;/");
int cantWords = s.countTokens();
Roger