tags:

views:

100

answers:

1
+1  Q: 

how do i do this

Input:

Hi. I am John.
My name is John. Who are you ?

Output:

Hi
I am John
My name is John
Who are you
+5  A: 
    String line = "Hi. My name is John. Who are you ?";
    String[] sentences = line.split("(?<=[.!?])\\s+");
    for (String sentence : sentences) {
       System.out.println("[" + sentence + "]");
    }

This produces:

[Hi.]
[My name is John.]
[Who are you ?]

See also


If you're not comfortable using split (even though it's the recommended replacement for the "legacy" java.util.StringTokenizer), you can just use only java.util.Scanner (which is more than adequate to do the job).

See also

Here's a solution that uses Scanner, which by the way implements Iterator<String>. For extra instructional value, I'm also showing an example of using java.lang.Iterable<T> so that you can use the for-each construct.

    final String text =
        "Hi. I am John.\n" +
        "My name is John. Who are you ?";

    Iterable<String> sentences = new Iterable<String>() {
        @Override public Iterator<String> iterator() {
            return new Scanner(text).useDelimiter("\\s*[.!?]\\s*");
        }
    };

    for (String sentence : sentences) {
        System.out.println("[" + sentence + "]");
    }

This prints:

[Hi]
[I am John]
[My name is John]
[Who are you]

If this regex is still not what you want, then I recommend investing the time to educate yourself so you can take matters into your own hand.

See also


Note: the final modifier for the local variable text in the above snippet is a necessity. In an illustrative example, it makes for a concise code, but in your actual code you should refactor the anonymous class to its own named class and have it take text in the constructor.

See also

polygenelubricants
hi... i dont want the Delimiter's to appear.....
John
@John: Then `split("[.!?]\\s+")`. Maybe even `split("\\s*[.!?]\\s+")`. Maybe even `split("\\s*[.!?]+\\s+")`. Feel free to clarify your unclear question to explain in more detail what is it that you want. More input, expected output, etc.
polygenelubricants
Hi... I am trying to take a file which contains a few lines of text.... and then split them up and store them in an array....The problem that i am facing is that when i use a tokenizer it stops reading at each line.... and when i try to get in a while loop there.... it goes into an infinite loop.....
John
@John: I need to go to bed, but if you look up the API, it says that "`StringTokenizer` is a legacy class that is retained for compatibility reasons although its use is discouraged in new code. It is recommended that anyone seeking this functionality use the `split` method of `String` or the `java.util.regex` package instead."
polygenelubricants
i've been reading bout split for the last couple of hours.... but all its done is confused me further
John
Thanks.... I'm still a bit confused on how line.split("(?<=[.!?])\\s+")works.... as in how does (?<=[.!?])\\s+") work ??
John
It's a positive lookbehind (http://www.regular-expressions.info/lookaround.html). `(?<=[.!?])` looks behind the current position, and see if there's a match for `[.!?]`. See for example, http://stackoverflow.com/questions/2559759/how-do-i-convert-camelcase-into-human-readable-names-in-java
polygenelubricants
thanks a ton.... the problem wasnt with the tokenizer but my loop that was reading from the file..... thanks for the java lesson on split though.... ended up using tht only... cheers !!!
John