tags:

views:

458

answers:

6

Say I have an array of strings:

string[] strArray = {"aa", "bb", "xx", "cc", "xx", "dd", "ee", "ff", "xx","xx","gg","xx"};

How do I use LINQ to extract the strings between the "xx" markers as groups?

Say by writing them to the console as:

cc
dd,ee,ff
gg
+16  A: 

A pure-functional solution (mutation-free):

string[] strArray = { "aa", "bb", "xx", "cc", "xx", "dd", 
                      "ee", "ff", "xx", "xx", "gg", "xx" };

var result = 
 strArray.Aggregate((IEnumerable<IEnumerable<string>>)new IEnumerable<string>[0],
   (a, s) => s == "xx" ? a.Concat(new[] { new string[0] })
      : a.Any() ? a.Except(new[] { a.Last() })
                   .Concat(new[] { a.Last().Concat(new[] { s }) }) : a)
         .Where(l => l.Any());

// Test
foreach (var i in result)
  Console.WriteLine(String.Join(",", i.ToArray()));

If you want to filter out the results past the last marker:

string[] strArray = { "aa", "bb", "xx", "cc", "xx", "dd", 
                      "ee", "ff", "xx", "xx", "gg", "xx"};

var result = 
  strArray.Aggregate(
    new { C = (IEnumerable<string>)null, 
          L = (IEnumerable<IEnumerable<string>>)new IEnumerable<string>[0] },
    (a, s) => s == "xx" ? a.C == null
        ? new { C = new string[0].AsEnumerable(), a.L }
        : new { C = new string[0].AsEnumerable(), L = a.L.Concat(new[] { a.C }) } 
        : a.C == null ? a : new { C = a.C.Concat(new[] { s }), a.L }).L
          .Where(l => l.Any());

// Test
foreach (var i in result)
  Console.WriteLine(String.Join(",", i.ToArray()));
Mehrdad Afshari
How in the world did you figure that out?.. that's impressive! +1
Jose Basilio
@Jose: Thanks. Wasn't that hard though. It's kinda a hack. Think of When you see Aggregate as some kind of loop, it'll make sense.
Mehrdad Afshari
+1 for wicked lambda kung fu. For practical purposes this probably isn't the most readable (though it isn't TOO bad), but it's still very impressive nonetheless!
Adam Robinson
+1 Holy LINQ batman!
Chad Grant
@Deviant: Just as a point of clarification, there is no LINQ in that statement, only a usage of the LINQ classes and extension methods with lambda expressions. :)
Adam Robinson
Interesting solution. I still think it's slightly convoluted to use LINQ in this case (though it's not *pure* LINQ), but it's probably what the asker wants for some reason or another, so good job anyway.
Noldorin
Actually, this style has just one benefit to the loop. Pure functional languages wouldn't let you mutate the list in the loop. This style can easily work on those languages. This is probably the *only* benefit. No reason to do that in C#, except for fun.
Mehrdad Afshari
+1 for taking the time :)
mhenrixon
Now, I think, it's purely functional, no mutation.
Mehrdad Afshari
This answer filters out items before the first marker, but it includes any items after the last marker... Which is correct, should they be in or out?
Guffa
@Guffa: I added a solution to that case
Mehrdad Afshari
@Adam , Requires System.Linq = It's Linq ... Chicken or Egg. I do understand they are just extension methods being called and not a linq query per se.
Chad Grant
A: 

You can select only the ones that are not "xx" but if you need to break a line every time you find one then you whould have to use a (for/for each) not a query.

The query to extract the "xx" would be

from s from array
where s != "xx"
select s
Oakcool
The problem is that this query will also return aa, bb which shouldn't be in the results, based on his requirements.
Jose Basilio
A: 

Partioning lists/arrays isn't something LINQ is particularly well suited to. I recommend you write your own extension method(s) that returns an IEnumerable<IEnumerable<T>> using iterators (yield keyword) if you want to make it compatible with LINQ (i.e. fully lazy sequences). If you don't care about lazy evaluation, then the easiest thing would probably just to write a method that generates a list by iterating over the array and finally returns a jagged array (string[][]>).

Noldorin
A: 

You can assign a group number to the items by using a group counter that you increase each time that you encounter an "xx" string. Then you filter out the "xx" strings, group on the group number, and filter out the empty groups:

int group = 0;
var lines =
   strArray
   .Select(s => new { Group = (s == "xx" ? ++group : group), Value = s })
   .Where(n => n.Value != "xx")
   .GroupBy(n => n.Group)
   .Where(g => g.Count() > 0);

foreach (var line in lines) {
   Console.WriteLine(string.Join(",", line.Select(s => s.Value).ToArray()));
}

Edit:
This solution will also remove the items before the first marker and after the last marker:

int group = 0;
var lines =
   strArray
   .Select(s => new { Group = s == "xx" ? group++ : group, Value = s })
   .GroupBy(n => n.Group)
   .Skip(1)
   .Where(g => g.Last().Value == "xx" && g.Count() > 1);

foreach (var line in lines) {
   Console.WriteLine(string.Join(",", line.Take(line.Count() - 1).Select(s => s.Value).ToArray()));
}
Guffa
This would still return "aa" and "bb" which the OP didn't want, is there a way to filter group 0 and group n (where n is the total number of groups - 1?
John Rasch
Yes, by leaving the marker as the last item in each group, skipping the first group and filter out any group that doesn't end with a marker. I added that to the answer.
Guffa
+3  A: 

A better approach may be to write a generic IEnumerable<T> split extension method and then pick and choose which parts of the results you want.

public static class IEnumerableExtensions
{
  public static IEnumerable<IEnumerable<TSource>> Split<TSource>(
                     this IEnumerable<TSource> source, TSource splitter)
  {
    if (source == null)
      throw new ArgumentNullException("source");
    if (splitter == null)
      throw new ArgumentNullException("splitter");

    return source.SplitImpl(splitter);
  }

  private static IEnumerable<IEnumerable<TSource>> SplitImpl<TSource>(
                     this IEnumerable<TSource> source, TSource splitter)
  {
    var list = new List<TSource>();

    foreach (TSource item in source)
    {
      if (!splitter.Equals(item))
      {
        list.Add(item);
      }
      else if (list.Count > 0)
      {
        yield return list.ToList();
        list.Clear();
      }
    }
  }
}
And use it like so
static void Main(string[] args)
{
  string[] strArray = { "aa", "bb", "xx", "cc", "xx", "dd",
                        "ee", "ff", "xx", "xx", "gg", "xx" };

  var result = strArray.Split("xx");
  foreach (var group in result.Skip(1).Take(3))
  {
    Console.WriteLine(String.Join(",", group.ToArray()));
 }

  Console.ReadKey(true);
}
And you get the desired output
cc
dd,ee,ff
gg
Samuel
+1  A: 

Add the following extension method:

public static class SplitExtensions {
    public static IEnumerable<IEnumerable<T>> SplitBy<T>(this IEnumerable<T> src, T separator) {
        var group = new List<T>();
        foreach (var elem in src){
            if (Equals(elem, separator)){
                yield return group;
                group = new List<T>();
            } else{
                group.Add(elem);
            }
        }
        yield return group;
    }
}

Here is usage of it:

string[] strArray = { "aa", "bb", "xx", "cc", "xx", "dd", "ee", "ff", "xx", "xx", "gg", "xx" };
var groups = from g in strArray.SplitBy("xx")
             where g.Any()
             select g;
Lloyd