views:

145

answers:

5

I'm hoping for a concise way to perform the following transformation. I want to transform song lyrics. The input will look something like this:

Verse 1 lyrics line 1
Verse 1 lyrics line 2
Verse 1 lyrics line 3
Verse 1 lyrics line 4

Verse 2 lyrics line 1
Verse 2 lyrics line 2
Verse 2 lyrics line 3
Verse 2 lyrics line 4

And I want to transform them so the first line of each verse is grouped together as in:

Verse 1 lyrics line 1
Verse 2 lyrics line 1

Verse 1 lyrics line 2
Verse 2 lyrics line 2

Verse 1 lyrics line 3
Verse 2 lyrics line 3

Verse 1 lyrics line 4
Verse 2 lyrics line 4

Lyrics will obviously be unknown, but the blank line marks a division between verses in the input.

A: 

Take your input as one large string. Then determine the number of lines in a verse.

Use .Split to get an array of strings, each item is now a line. Then loop through the number of lines you have and use stringbuilder to append SplitStrArray(i) and SplitStrArray(i+lines in a verse).

I think that will be the best approach. I'm not saying LINQ isn't awesome, but it seems silly to say, 'I have a problem and I want to use this tool to solve it'.

"I have to get a screw into the wall - but I want to use a hammer". If you are determined, you'll probably find a way to use the hammer; but IMHO, that's not the best course of action. Maybe someone else will have a really awesome LINQ example that makes it super easy and I'll feel silly for posting this....

Rob P.
Yes, doing this procedurally this would be easy. Since this is non-critical "weekend code" I was curious whether there would be a way you could do this in a LINQ one-liner.
Larsenal
It's not that Linq isn't a good tool for this, it's just that the particular transformations you need aren't part of the standard Linq library. You need a `Split` method and a `Zip` method, neither of which are standard, but both of which are easy to write.
Aaronaught
Zip is being added to .NET 4 (http://msdn.microsoft.com/en-us/library/dd267698%28VS.100%29.aspx).
Matthew Flaschen
@Matthew Flaschen: Unfortunately, the Zip extension in .NET 4 can only zip two sequences, not an arbitrary number of sequences (i.e. an `IEnumerable<IEnumerable<T>>`. But it's easy to write one.
Aaronaught
+1  A: 

There is probably a more concise way to do this, but here's one solution that works given valid input:

        var output = String.Join("\r\n\r\n", // join it all in the end
        Regex.Split(input, "\r\n\r\n") // split on blank lines
            .Select(v => Regex.Split(v, "\r\n")) // now split lines in each verse
            .SelectMany(vl => vl.Select((lyrics, i) => new { Line = i, Lyrics = lyrics })) // flatten things out, but attach line number
            .GroupBy(b => b.Line).Select(c => new { Key = c.Key, Value = c }) // group by line number
            .Select(e => String.Join("\r\n", e.Value.Select(f => f.Lyrics).ToArray())).ToArray());

Obviously this is pretty ugly. Not at all a suggestion for production code.

Larsenal
A: 

Give this a try. Regex.Split is used to prevent the extra blank entries String.Split can be used to determine where the first blank line occurs with the help of the Array.FindIndex method. This indicates the number of verses available between each blank line (given the format is consistent of course). Next, we filter the blank lines out and determine each line's index and group them by the modulus of the aforementioned index.

string input = @"Verse 1 lyrics line 1
Verse 1 lyrics line 2
Verse 1 lyrics line 3
Verse 1 lyrics line 4
Verse 1 lyrics line 5

Verse 2 lyrics line 1
Verse 2 lyrics line 2
Verse 2 lyrics line 3
Verse 2 lyrics line 4
Verse 2 lyrics line 5

Verse 3 lyrics line 1
Verse 3 lyrics line 2
Verse 3 lyrics line 3
Verse 3 lyrics line 4
Verse 3 lyrics line 5
";

// commented original Regex.Split approach
//var split = Regex.Split(input, Environment.NewLine);
var split = input.Split(new[] { Environment.NewLine }, StringSplitOptions.None);
// find first blank line to determine # of verses
int index = Array.FindIndex(split, s => s == "");
var result = split.Where(s => s != "")
                  .Select((s, i) => new { Value = s, Index = i })
                  .GroupBy(item => item.Index % index);

foreach (var group in result)
{
    foreach (var item in group)
    {
        Console.WriteLine(item.Value);
    }        
    Console.WriteLine();
}
Ahmad Mageed
They don't really need to be trimmed. I trimmed them because I lined up all of the lyrics in my example. If you slide them to the edge as yours the trim is no longer needed. This would depend on the input. If you use a line reader from a text file it again wouldn't be an issue. I typically use .Trim() anyway to make sure my strings are "clean"
Matthew Whited
@Matthew thanks for the feedback. I was initially trying to avoid the `Regex.Split` and seemed to get blank lines when using the regular `Split` w/o trimming them. I'll have to retrace my steps to reproduce that and figure out what happened.
Ahmad Mageed
Would it be possible that your empty line had a space or a tab in it by accident? That is why I typically use .Trim() before checking for empty. Helps get around those annoying bugs you can't "see".
Matthew Whited
@Matthew no spaces/tabs. Oddly I just tried again and it works fine now w/o trimming. No repro heh :)
Ahmad Mageed
+1  A: 

LINQ is so sweet... I just love it.

static void Main(string[] args)
{
    var lyrics = @"Verse 1 lyrics line 1 
                   Verse 1 lyrics line 2 
                   Verse 1 lyrics line 3 
                   Verse 1 lyrics line 4 

                   Verse 2 lyrics line 1 
                   Verse 2 lyrics line 2 
                   Verse 2 lyrics line 3 
                   Verse 2 lyrics line 4";
    var x = 0;
    var indexed = from lyric in lyrics.Split(new[] { Environment.NewLine },
                                             StringSplitOptions.None)
                  let line = lyric.Trim()
                  let indx = line == string.Empty ? x = 0: ++x
                  where line != string.Empty
                  group line by indx;

    foreach (var trans in indexed)
    {
        foreach (var item in trans)
            Console.WriteLine(item);
        Console.WriteLine();
    }
    /*
        Verse 1 lyrics line 1
        Verse 2 lyrics line 1

        Verse 1 lyrics line 2
        Verse 2 lyrics line 2

        Verse 1 lyrics line 3
        Verse 2 lyrics line 3

        Verse 1 lyrics line 4
        Verse 2 lyrics line 4
     */
}
Matthew Whited
Mutating state (`++x`) inside a LINQ expression is not good style because it assumes a certain order of processing. It may work here, but it may not work if you put a `.AsParallel()` after the Split, for instance.
Gabe
There are a lot of things that "shouldn't" be done, but are in fact done anyway because they are the easiest way to do them. All of the examples are going to require a known order of processing so they will all have issues with the "magic" versions of multi-threading. There are things we are programmers and engineers must understand and expect. Sometimes sacrifices must be made. Feel free to create your own example if you have an issue with mine.
Matthew Whited
+2  A: 

I have a few extension methods I always keep around that make this type of processing very simple. The solution in its entirety is going to be longer than others, but these are useful methods to have around, and once you have the extension methods in place then the answer is very short and easy-to-read.

First, there's a Zip method that takes an arbitrary number of sequences:

public static class EnumerableExtensions
{
    public static IEnumerable<T> Zip<T>(
        this IEnumerable<IEnumerable<T>> sequences,
        Func<IEnumerable<T>, T> aggregate)
    {
        var enumerators = sequences.Select(s => s.GetEnumerator()).ToArray();
        try
        {
            while (enumerators.All(e => e.MoveNext()))
            {

                var items = enumerators.Select(e => e.Current);
                yield return aggregate(items);
            }
        }
        finally
        {
            foreach (var enumerator in enumerators)
            {
                enumerator.Dispose();
            }
        }
    }
}

Then there's a Split method which does roughly the same thing to an IEnumerable<T> that string.Split does to a string:

public static IEnumerable<IEnumerable<T>> Split<T>(this IEnumerable<T> items,
    Predicate<T> splitCondition)
{
    using (IEnumerator<T> enumerator = items.GetEnumerator())
    {
        while (enumerator.MoveNext())
        {
            yield return GetNextItems(enumerator, splitCondition).ToArray();
        }
    }
}

private static IEnumerable<T> GetNextItems<T>(IEnumerator<T> enumerator,
    Predicate<T> stopCondition)
{
    do
    {
        T item = enumerator.Current;
        if (stopCondition(item))
        {
            yield break;
        }
        yield return item;
    } while (enumerator.MoveNext());
}

Once you have these extensions in place, solving the song-lyric problem is a piece of cake:

string lyrics = ...
var verseGroups = lyrics
    .Split(new[] { Environment.NewLine }, StringSplitOptions.None)
    .Select(s => s.Trim())  // Optional, if there might be whitespace
    .Split(s => string.IsNullOrEmpty(s))
    .Zip(seq => string.Join(Environment.NewLine, seq.ToArray()))
    .Select(s => s + Environment.NewLine);  // Optional, add space between groups
Aaronaught
Very handy ZIP method!
Larsenal