tags:

views:

107

answers:

5

Hello all,

I have this regex which is supposed to remove sentence delimiters(. and ?):

sentence = sentence.replaceAll("\\.|\\?$","");

It works fine it converts

"I am Java developer." to "I am Java developer"

"Am I a Java developer?" to "Am I a Java developer"

But after deployment we found that it also replaces any other dots in the sentence as

"Hi.Am I a Java developer?" becomes "HiAm I a Java developer"

Why is this happening?

+3  A: 

You have forgotten to embrace the sentence-ending characters with round brackets:

sentence = sentence.replaceAll("(\\.|\\?)$","");

The better approach is to use [.?]$ like @Mark Byers suggested.

sentence = sentence.replaceAll("[.?]$","");
splash
+12  A: 

The pipe (|) has the lowest precedence of all operators. So your regex:

\\.|\\?$

is being treated as:

(\\.)|(\\?$)

which matches a . anywhere in the string and matches a ? at the end of the string.

To fix this you need to group the . and ? together as:

(?:\\.|\\?)$

You could also use:

[.?]$

Within a character class . and ? are treated literally so you need not escape them.

codaddict
Thanks for the clear explanation. It works now.
+1 Nice explanation.
jensgram
When you thank someone, check his answer as useful, that's a minimum ;-). +1 for the detailed explanation
Aurélien Ribon
+7  A: 

Your problem is because of the low precedence of the alternation operator |. Your regular expression means match one of:

  • . anywhere or
  • ? at the end of a line.

Use a character class instead:

"[.?]$"
Mark Byers
+8  A: 

What you're saying with "\\.|\\?$" is "either a period" or "a question mark as the last character".

I would recommend "[.?]$" instead in order to avoid the confusing escaping (and undesirable result, of course).

jensgram
Not the only one with this idea, it seems :)
jensgram
+1 to align scores with Mark Byers since they are same answers :p
Aurélien Ribon
A: 

Java Scanner class also help you do so.

Kamahire