views:

54

answers:

2

for below code

var str = "I left the United States with my eyes full of tears! I knew I would miss my American friends very much.All the best to you";
var re = new RegExp("[^\.\?!]*(?:[\.\?!]+|\s$)", "g");
var myArray = str.match(re);

and This is what I am getting as a result

myArray[0] = "I left the United States with my eyes full of tears!"
myArray[1] = " I knew I would miss my American friends very much."

I want to add one more condition to regex such that the text will break only if there is a space after the the punctuation mark (? or . or !)

I do it do that so the result for above case is

myArray[0] = "I left the United States with my eyes full of tears!"
myArray[1] = " I knew I would miss my American friends very much.All the best to you "
myArray[2] = ""
A: 
.+?([!?.](?= |$)|$)

should work.

It will match any sequence of characters that are either

  • followed by a punctuation sign that is itself followed by a space or end-of-string, or
  • followed by the end of the string.

By using the reluctant quantifier +?, it finds the shortest possible sequences (=single sentences).

In JavaScript:

result = subject.match(/.+?([!?.](?= |$)|$)/g);

EDIT:

In order to avoid the regex splitting on "space/single letter or multidigit number/dot", you can use:

result = subject.match(/( \d+\.| [^\W\d_]\.|.)+?([!?.](?= |$)|$)/g);

This will split

I left the United States with my eyes full of tears! 23. I knew I would miss my American friends very much. I. All the best to you.

into

I left the United States with my eyes full of tears!
 23. I knew I would miss my American friends very much.
 I. All the best to you.

What it does is instead of simply matching any character until it finds a dot is:

  • First try to match a space, a number, and a dot.
  • If that fails, try to match a space, a letter, and a dot.
  • If that fails, match any character.

That way, the dot after a number/letter has already been matched and will not be matched as the delimiting punctuation character that follows next in the regex.

Tim Pietzcker
Is it possible to prevent the string to break in statements if the character is like " S." "space" followed by "single character or multidigit number" followed by a "." ? Actually I want to understand how we can achieve this so that I can add more such cases.
Sourabh
Thanks a lot, I am trying to understand the expression.
Sourabh
+1  A: 

var str = "I left the United States with my eyes full of tears! I knew I would miss my American friends very much.All the best to you";

var re =/[^\.\?!]+[\.?!]( +|[^\.\?!]+)/g;
var myArray = str.match(re);
myArray.join('\n')

/*  returned value: (String)
I left the United States with my eyes full of tears! 
I knew I would miss my American friends very much.All the best to you
*/
kennebec