views:

475

answers:

2

I am trying to break a large string of text into several smaller strings of text and define each smaller text strings max length to be different. for example:

"The quick brown fox jumped over the red fence.
       The blue dog dug under the fence."

I would like to have code that can split this into smaller lines and have the first line have a max of 5 characters, the second line have a max of 11, and rest have a max of 20, resulting in this:

Line 1: The 
Line 2: quick brown
Line 3: fox jumped over the 
Line 4: red fence.
Line 5:        The blue dog 
Line 6: dug under the fence.

All this in C# or MSSQL, is it possible?

A: 

Everything is possible :P

what about something like this:

public List<String> SplitString(String text, int [] lengths)
{
   List<String> output = new List<String>();

   List<String> words = Split(text);

   int i = 0;
   int lineNum = 0;
   string s = string.empty;
   while(i<words.Length)
   {
       if(s.Length+words[i].Length <lengths[lineNum])
       {
            s+=words[i];
            i++;
            if(lineNum<lengths.Length-1)
                 lineNum++;
       }
       else
       {
          output.Add(s);
          s=String.Empty;
       }

   }

    s.Remove(S.length-1,1);// deletes last extra space.

    return output;
}


   public static List<string> Split(string text)
    {
        List<string> result = new List<string>();
        StringBuilder sb = new StringBuilder();

        foreach (var letter in text)
        {
            if (letter != ' ' && letter != '\t' && letter != '\n')
            {
                sb.Append(letter);
            }
            else
            {
                if (sb.Length > 0)
                {

                    result.Add(sb.ToString());
                }

                result.Add(letter.ToString());
                sb = new StringBuilder();
            }
        }

        return result;
    }

something like that should work. I think its quite simple and easy to understand.I just wrote it on the fly here in SO so it may not compile just like that but you get the idea, just toy around with it.

I think you should also use a stringbuilder instead, but I didnt remember how to use it exactly from the top of my head

Francisco Noriega
what about cases where there is extra whitespace that needs to be preserved. If i just split on whitespace those areas will be removed and that is not acceptable.
Frank
@Frank it would work if there where extra spaces together. For a string like this: "1 2 3 4 "The array would be:{"1","2","","3","","",4","","",""} the problem would actually be with \n and \t's not separating words.that however can be easily solved by rolling your own little split method. I will added to my solution as an example.
Francisco Noriega
A: 
\A(.{0,5}\b)(.{0,11}\b)(.{0,20}\b)+\Z

will capture up to five characters in group 1, up to 11 in group 2 and chunks of up to 20 in group 3. Matches will be split along word delimiters in order to avoid splitting in the middle of a word. Whitespace, line break etc. count as characters and will be preserved.

The trick is to get at the individual matches in the repeated group, something that can only be done in .NET and Perl 6:

Match matchResults = null;
Regex paragraphs = new Regex(@"\A(.{0,5}\b)(.{0,11}\b)(.{0,20}\b)+\Z", RegexOptions.Singleline);
matchResults = paragraphs.Match(subjectString);
if (matchResults.Success) {
    String line1 = matchResults.Groups[1].Value;
    String line2 = matchResults.Groups[2].Value;
    Capture line3andup = matchResults.Groups[3].Captures;
    // you now need to iterate over line3andup, extracting the lines.
} else {
    // Match attempt failed
} 

I don't know C# at all and have tried to construct this from RegexBuddy's templates and the VB code here, so please feel free to point out my coding errors.

Note that the whitespace at the beginning of line two is captured at the end of the previous match.

Tim Pietzcker
@CD: Thanks for the semicolons :)
Tim Pietzcker