views:

3953

answers:

9

I'm splitting a string by three different characters but I want the output to include the characters I split by. Is there any easy way to do this?

+1  A: 
result = originalString.Split(separator);
for(int i = 0; i < result.Length - 1; i++)
    result[i] += separator;

(EDIT - this is a bad answer - I misread his question and didn't see that he was splitting by multiple characters.)

(EDIT - a correct LINQ version is awkward, since the separator shouldn't get concatenated onto the final string in the split array.)

mquander
This only works if there is one single separator. You may need to employ some regex magic.
Øyvind Skaar
That's true. I'm sorry -- I didn't read the question well.
mquander
A: 

Recently I wrote an extension method do to this:

public static class StringExtensions
    {
        public static IEnumerable<string> SplitAndKeep(this string s, string seperator)
        {
            string[] obj = s.Split(new string[] { seperator }, StringSplitOptions.None);

            for (int i = 0; i < obj.Length; i++)
            {
                string result = i == obj.Length - 1 ? obj[i] : obj[i] + seperator;
                yield return result;
            }
        }
    }
BFree
A: 

Regex.Split looks like it might be able to do what you want perhaps.

Garry Shutler
+13  A: 

I'd try:

string[] parts = Regex.Split(originalString, @"(?<=[.,;])")

(if the split chars were , . and ;)

(?<=PATTERN) is positive-lookbehind. It should match at any place where the preceeding text fits PATTERN so there should be a match (and a split) after each occurance of any the characters.

it depends
A: 
using System.Collections.Generic;
using System.Text.RegularExpressions;

namespace ConsoleApplication9
{
    class Program
    {
        static void Main(string[] args)
        {
            string input = @"This;is:a.test";
            char sep0 = ';', sep1 = ':', sep2 = '.';
            string pattern = string.Format("[{0}{1}{2}]|[^{0}{1}{2}]+", sep0, sep1, sep2);
            Regex regex = new Regex(pattern);
            MatchCollection matches = regex.Matches(input);
            List<string> parts=new List<string>();
            foreach (Match match in matches)
            {
                parts.Add(match.ToString());
            }
        }
    }
}
Øyvind Skaar
A: 

Iterate through the string character by character (which is what regex does anyway. When you find a splitter, then spin off a substring.

pseudo code

int hold, counter;
List<String> afterSplit;
string toSplit

for(hold = 0, counter = 0; counter < toSplit.Length; counter++)
{
   if(toSplit[counter] = /*split charaters*/)
   {
      afterSplit.Add(toSplit.Substring(hold, counter));
      hold = counter;
   }
}

That's sort of C# but not really. Obviously, choose the appropriate function names. Also, I think there might be an off-by-1 error in there.

But that will do what you're asking.

A: 

This seems to work, but its not been tested much.

public static string[] SplitAndKeepSeparators(string value, char[] separators, StringSplitOptions splitOptions)
{
    List<string> splitValues = new List<string>();
    int itemStart = 0;
    for (int pos = 0; pos < value.Length; pos++)
    {
        for (int sepIndex = 0; sepIndex < separators.Length; sepIndex++)
        {
            if (separators[sepIndex] == value[pos])
            {
                // add the section of string before the separator 
                // (unless its empty and we are discarding empty sections)
                if (itemStart != pos || splitOptions == StringSplitOptions.None)
                {
                    splitValues.Add(value.Substring(itemStart, pos - itemStart));
                }
                itemStart = pos + 1;

                // add the separator
                splitValues.Add(separators[sepIndex].ToString());
                break;
            }
        }
    }

    // add anything after the final separator 
    // (unless its empty and we are discarding empty sections)
    if (itemStart != value.Length || splitOptions == StringSplitOptions.None)
    {
        splitValues.Add(value.Substring(itemStart, value.Length - itemStart));
    }

    return splitValues.ToArray();
}
Sprotty
+2  A: 

Building off from BFree's answer, I had the same goal, but I wanted to split on an array of characters similar to the original Split method, and I also have multiple splits per string (it seems that BFree only has 1 split per string?) Here is the code I came up with:

    public static IEnumerable<string> SplitAndKeep(this string s, char[] delims)
    {
        int start = 0;
        int index = 0;

        while ((index = s.IndexOfAny(delims, start)) != -1)
        {
            index++;
            index = Interlocked.Exchange(ref start, index);

            yield return s.Substring(index, start-index-1);
            yield return s.Substring(start-1, 1);
        }

        if (start < s.Length)
        {
            yield return s.Substring(start);
        }
    }
esac
+1 It includes the delimiter in an array index as it sounded like the OP wanted.
p.campbell
Why are you using interlocked?
Dykam
No reason particularly, I just didn't see a simple 'Swap' operation available. It could be replaced by many of the alternative swap methods.
esac
if (start<s.Length) yield return s.Substring(start);This prevents empty strings in result when last char is a separator.
Marko
@Marko: thanks, I have updated the code with your suggestion.
esac
A: 
public static class String_Ext
{
    public static string[] SplitOnGroups(this string str, string pattern)
    {
        var matches = Regex.Matches(str, pattern);
        var partsList = new List<string>();
        for (var i = 0; i < matches.Count; i++)
        {
            var groups = matches[i].Groups;
            for (var j = 0; j < groups.Count; j++)
            {
                var group = groups[j];
                partsList.Add(group.Value);
            }
        }
        return partsList.ToArray();
    }
}

var parts = "abcde  \tfgh\tikj\r\nlmno".SplitOnGroups(@"\s+|\S+");

for (var i = 0; i < parts.Length; i++)
    Print(i + "|" + Translate(parts[i]) + "|");

result:
0|abcde|
1|  \t|
2|fgh|
3|\t|
4|ikj|
5|\r\n|
6|lmno|
vladb