views:

681

answers:

6

I have a block of text and I want to get its lines without losing the \r and \n at the end. Right now, I have the following (suboptimal code):

string[] lines = tbIn.Text.Split('\n')
                     .Select(t => t.Replace("\r", "\r\n")).ToArray();

So I'm wondering - is there a better way to do it?

Accepted answer

string[] lines =  Regex.Split(tbIn.Text, @"(?<=\r\n)(?!$)");
A: 

If you are just going to replace the newline (\n) then do something like this:

string[] lines = tbIn.Text.Split('\n')
                     .Select(t => t + "\r\n").ToArray();

Edit: Regex.Replace allows you to split on a string.

string[] lines = Regex.Split(tbIn.Text, "\r\n")
       .Select(t => t + "\r\n").ToArray();
Andrew Hare
This would cause me having \r\r\n on the end :(
Dmitri Nesteruk
I don't get what you mean - what does your original string look like?
Andrew Hare
Ah - I think I see the problem now - I though you were converting a linux style newline (\n) to windows (\r\n). I will research and edit my post.
Andrew Hare
A: 

Something along the lines of using this regular expression: [^\n\r]*\r\n

Then use Regex.Matches(). The problem is you need Group(1) out of each match and create your string list from that. In Python you'd just use the map() function. Not sure the best way to do it in .NET, you take it from there ;-)

mhenry1384
A: 

Dmitri, your solution is actually pretty compact and straightforward. The only thing more efficient would be to keep the string-splitting characters in the generated array, but the APIs simply don't allow for that. As a result, every solution will require iterating over the array and performing some kind of modification (which in C# means allocating new strings every time). I think the best you can hope for is to not re-create the array:

string[] lines = tbIn.Text.Split('\n');
for (int i = 0; i < lines.Length; ++i)
{
    lines[i] = lines[i].Replace("\r", "\r\n");
}

... but as you can see that looks a lot more cumbersome! If performance matters, this may be a bit better. If it really matters, you should consider manually parsing the string by using IndexOf() to find the '\r's one at a time, and then create the array yourself. This is significantly more code, though, and probably not necessary.

One of the side effects of both your solution and this one is that you won't get a terminating "\r\n" on the last line if there wasn't one already there in the TextBox. Is this what you expect? What about blank lines... do you expect them to show up in 'lines'?

JaredReisinger
A: 

You can achieve this with a regular expression. Here's an extension method with it:

    public static string[] SplitAndKeepDelimiter(this string input, string delimiter)
    {
        MatchCollection matches = Regex.Matches(input, @"[^" + delimiter + "]+(" + delimiter + "|$)", RegexOptions.Multiline);
        string[] result = new string[matches.Count];
        for (int i = 0; i < matches.Count ; i++)
        {
            result[i] = matches[i].Value;
        }
        return result;
    }

I'm not sure if this is a better solution. Yours is very compact and simple.

bruno conde
A: 

As always, extension method goodies :)

public static class StringExtensions
{
    public static IEnumerable<string> SplitAndKeep(this string s, string seperator)
    {
        string[] obj = s.Split(new string[] { seperator }, StringSplitOptions.None);

        for (int i = 0; i < obj.Length; i++)
        {
            string result = i == obj.Length - 1 ? obj[i] : obj[i] + seperator;
            yield return result;
        }
    }
}

usage:

        string text = "One,Two,Three,Four";
        foreach (var s in text.SplitAndKeep(","))
        {
            Console.WriteLine(s);
        }

Output:

One,

Two,

Three,

Four

BFree
+2  A: 

The following seems to do the job:

string[] lines =  Regex.Split(tbIn.Text, @"(?<=\r\n)(?!$)");

(?<=\r\n) uses 'positive lookbehind' to match after \r\n without consuming it.

(?!$) uses negative lookahead to prevent matching at the end of the input and so avoids a final line that is just an empty string.

it depends
Oh... my... god. This is excellent. Accepted! Thank you! (I wish I was this smart.)
Dmitri Nesteruk