views:

4461

answers:

6

I need to match a string like "one. two. three. four. five. six. seven. eight. nine. ten. eleven" into groups of four sentences. I need a regular expression to break the string into a group after every fourth period. Something like:

  string regex = @"(.*.\s){4}";

  System.Text.RegularExpressions.Regex exp = new System.Text.RegularExpressions.Regex(regex);

  string result = exp.Replace(toTest, ".\n");

doesn't work because it will replace the text before the periods, not just the periods themselves. How can I count just the periods and replace them with a period and new line character?

+1  A: 

Try defining the method

private string AppendNewLineToMatch(Match match) {
    return match.Value + Environment.NewLine;
}

and using

string result = exp.Replace(toTest, AppendNewLineToMatch);

This should call the method for each match, and replace it with that method's result. The method's result would be the matching text and a newline.


EDIT: Also, I agree with Oliver. The correct regex definition should be:

  string regex = @"([^.]*[.]\s*){4}";

Another edit: Fixed the regex, hopefully I got it right this time.

configurator
The @"[^.]*[.]\s*{4}" expression gives a nested quantifier error. The expression @"([^.]*[.]){4}\s*"; (from James Curran) results in:one. two. three. four.one. two. three. four.nine. ten. eleven
Tai Squared
+2  A: 

. in a regex means "any character"

so in your regex, you have used .*. which will match a word (this is equivalent to .+)

You were probably looking for [^.]*[.] - a series of characters that are not "."s followed by a ".".

Oliver Hallam
A: 

Search expression: @"(?:([^\.]+?).\s)(?:([^\.]+?).\s)(?:([^\.]+?).\s)(?:([^\.]+?).\s)" Replace expression: "$1 $2 $3 $4.\n"

I've ran this expression in RegexBuddy with .NET regex selected, and the output is:

one two three four.
five six seven eight.
nine. ten. eleven

I tried with a @"(?:([^.]+?).\s){4}" type of arrangement, but the capturing will only capture the last occurrence (i.e. word), so when it comes to replacing, you will lose three words out of 4. Please someone correct me if I am wrong.

The original string resulted in (brackets to show one line [one two three.][four five six seven.][eight. nine. ten. eleven]. Running this on a string like: " one. two . three . four. five . six. seven. eight . nine. ten. eleven" resulted in [ one two thre.][. four fiv six.]
Tai Squared
A: 

Are you forced to do this via regex? Wouldn't it be easier to just split the string then process the array?

EBGreen
A: 

In this case it would seem that regex is a bit of overkill. I would recommend using String.split and then breaking up the resulting array of strings. It should be far simpler and far more reliable than trying to make a regex do what you're trying to do.

Something like this might be a bit easier to read and debug.

String s = "one. two. three. four. five. six. seven. eight. nine. ten. eleven"
String[] splitString = s.split(".")
List li = new ArrayList(splitString.length/2)
for(int i=0;i<splitString.length;i+=4) {
    st = splitString[i]+"."
    st += splitString[i+1]+"."
    st += splitString[i+2]+"."
    st += splitString[i+3]+"."
    li.add(st)
}
Matthew Brubaker
A: 

I'm not sure if configurator's answer got mangled by the editor or what, but it doesn't work. The Correct pattern is

string regex = @"([^.]*[.]){4}\s*";
James Curran