views:

1514

answers:

20

I wanted to bring this challenege to the attention of the stackoverflow community. The original problem and answers are here. BTW, if you did not follow it before, you should try to read Eric's blog, it is pure wisdom.

Summary:

Write a function that takes a non-null IEnumerable and returns a string with the following characteristics:

  1. If the sequence is empty the resulting string is "{}".
  2. If the sequence is a single item "ABC" then the resulting string is "{ABC}".
  3. If the sequence is the two item sequence "ABC", "DEF" then the resulting string is "{ABC and DEF}".
  4. If the sequence has more than two items, say, "ABC", "DEF", "G", "H" then the resulting string is "{ABC, DEF, G and H}". (Note: no Oxford comma!)

As you can see even our very own Jon Skeet (yes, it is well known that he can be in two places at the same time) has posted a solution but his (IMHO) is not the most elegant although probably you can not beat its performance.

What do you think? There are pretty good options there. I really like one of the solutions that involves the select and aggregate methods (from Fernando Nicolet). Linq is very powerful and dedicating some time to challenges like this make you learn a lot. I twisted it a bit so it is a bit more performant and clear (by using Count and avoiding Reverse):

    public static string CommaQuibbling(IEnumerable<string> items)
    {

        int last = items.Count() - 1;
        Func<int, string> getSeparator = (i) => i == 0 ? string.Empty : (i == last ? " and " : ", ");
        string answer = string.Empty;

        return "{" + items.Select((s, i) => new { Index = i, Value = s })
                          .Aggregate(answer, (s, a) => s + getSeparator(a.Index) + a.Value) + "}";

    }
+18  A: 

How about this approach? Purely cumulative - no back-tracking, and only iterates once. For raw performance, I'm not sure you'll do better with LINQ etc, regardless of how "pretty" a LINQ answer might be.

using System;
using System.Collections.Generic;
using System.Text;

static class Program
{
    public static string CommaQuibbling(IEnumerable<string> items)
    {
        StringBuilder sb = new StringBuilder('{');
        using (var iter = items.GetEnumerator())
        {
            if (iter.MoveNext())
            { // first item can be appended directly
                sb.Append(iter.Current);
                if (iter.MoveNext())
                { // more than one; only add each
                  // term when we know there is another
                    string lastItem = iter.Current;
                    while (iter.MoveNext())
                    { // middle term; use ", "
                        sb.Append(", ").Append(lastItem);
                        lastItem = iter.Current;
                    }
                    // add the final term; since we are on at least the
                    // second term, always use " and "
                    sb.Append(" and ").Append(lastItem);
                }
            }
        }
        return sb.Append('}').ToString();
    }
    static void Main()
    {
        Console.WriteLine(CommaQuibbling(new string[] { }));
        Console.WriteLine(CommaQuibbling(new string[] { "ABC" }));
        Console.WriteLine(CommaQuibbling(new string[] { "ABC", "DEF" }));
        Console.WriteLine(CommaQuibbling(new string[] {
             "ABC", "DEF", "G", "H" }));
    }
}
Marc Gravell
Did you have anything special in mind when you wrote "new StringBuilder().Append('{')", which was changed by Peter Wone?
VVS
No; the change is fine.
Marc Gravell
Whenever I do this sort of thing it ends up looking so much like Mark's solution that one little code tweak was all there was left for me to say.
Peter Wone
Is sb.Append(", ").Append(lastItem) cheaper than sb.AppendFormat(", {0}", lastItem) ?
Peter Wone
@Peter: yes.... it doesn't have to parse the format string ", {0}" - it is just two virtual calls. Internally, the parsing will do a lot more work, and probably still call the exact same 2 methods.
Marc Gravell
The question is not about performance, question is to understand power of LINQ to write complicated algorithms simple way, high performance code equals more lines of code, more lines of code means more time to code and manage logic.
Akash Kava
@Akash - tell me where it mentions LINQ as a *requirement*?
Marc Gravell
A: 
public static string CommaQuibbling(IEnumerable<string> items)
{
  int count = items.Count();
  string answer = string.Empty;
  return "{" + 
      (count==0)  ?  ""  :  
         (  items[0] + 
             (count == 1 ? "" :  
                 items.Range(1,count-1).
                     Aggregate(answer, (s,a)=> s += ", " + a) +
                 items.Range(count-1,1).
                     Aggregate(answer, (s,a)=> s += " AND " + a) ))+ "}";
}

It is implemented as,

if count == 0 , then return empty,
if count == 1 , then return only element,
if count > 1 , then take two ranges, 
   first 2nd element to 2nd last element
   last element
Akash Kava
+2  A: 

Here's a simple F# solution, that only does one forward iteration:

let CommaQuibble items =
    let sb = System.Text.StringBuilder("{")
    // pp is 2 previous, p is previous
    let pp,p = items |> Seq.fold (fun (pp:string option,p) s -> 
        if pp <> None then
            sb.Append(pp.Value).Append(", ") |> ignore
        (p, Some(s))) (None,None)
    if pp <> None then
        sb.Append(pp.Value).Append(" and ") |> ignore
    if p <> None then
        sb.Append(p.Value) |> ignore
    sb.Append("}").ToString()

(EDIT: Turns out this is very similar to Skeet's.)

The test code:

let Test l =
    printfn "%s" (CommaQuibble l)

Test []
Test ["ABC"]        
Test ["ABC";"DEF"]        
Test ["ABC";"DEF";"G"]        
Test ["ABC";"DEF";"G";"H"]        
Test ["ABC";null;"G";"H"]
Brian
A: 

How about skipping complicated aggregation code and just cleaning up the string after you build it?

public static string CommaQuibbling(IEnumerable<string> items)    
{
    var aggregate = items.Aggregate<string, StringBuilder>(
        new StringBuilder(), 
        (b,s) => b.AppendFormat(", {0}", s));
    var trimmed = Regex.Replace(aggregate.ToString(), "^, ", string.Empty);
    return string.Format(
               "{{{0}}}", 
               Regex.Replace(trimmed, 
                   ", (?<last>[^,]*)$", @" and ${last}"));
}

UPDATED: This won't work with strings with commas, as pointed out in the comments. I tried some other variations, but without definite rules about what the strings can contain, I'm going to have real problems matching any possible last item with a regular expression, which makes this a nice lesson for me on their limitations.

MichaC
Won't work if you aggregate strings with commas in them. A legit use for that would be a list of quoted phrases, e.g. {'Me and you', 'They' and 'He, she, and I'}
Pontus Gagge
Thanks, updated with a version that supports your example.
MichaC
Now, if the last string contains regexp-special characters (like '^' or '$') does it still work? I think no, but I haven't tried it.
Brian
Sigh... I just thought of that myself. I need to give up on regular expressions for matching without definite rules about what the strings can contain.
MichaC
+4  A: 

If I was doing a lot with streams which required first/last information, I'd have thid extension:

[Flags]
public enum StreamPosition
{
   First = 1, Last = 2
}

public static IEnumerable<R> MapWithPositions<T, R> (this IEnumerable<T> stream, 
    Func<StreamPosition, T, R> map)
{
    using (var enumerator = stream.GetEnumerator ())
    {
        if (!enumerator.MoveNext ()) yield break ;

        var cur   = enumerator.Current   ;
        var flags = StreamPosition.First ;
        while (true)
        {
            if (!enumerator.MoveNext ()) flags |= StreamPosition.Last ;
            yield return map (flags, cur) ;
            if ((flags & StreamPosition.Last) != 0) yield break ;
            cur   = enumerator.Current ;
            flags = 0 ;
        }
    }
}

Then the simplest (not the quickest, that would need a couple more handy extension methods) solution will be:

public static string Quibble (IEnumerable<string> strings)
{
    return "{" + String.Join ("", strings.MapWithPositions ((pos, item) => (
       (pos &  StreamPosition.First) != 0      ? "" : 
        pos == StreamPosition.Last   ? " and " : ", ") + item)) + "}" ;
}
Anton Tykhyy
Cool idea. Makes the solution quite simple. I wonder what the performance would be like.
Daniel Fortunov
+1 Elegant, concise, clear. It's not a question of "pretty" or "clever", it's about clarity of purpose and extent of effect.
Peter Wone
Since JIT doesn't optimize methods returning value types until quite recent versions of .NET, the above change should have better performance.
Anton Tykhyy
+1  A: 

Disclaimer: I used this as an excuse to play around with new technologies, so my solutions don't really live up to the Eric's original demands for clarity and maintainability.

Naive Enumerator Solution

(I concede that the foreach variant of this is superior, as it doesn't require manually messing about with the enumerator.)

public static string NaiveConcatenate(IEnumerable<string> sequence)
{
    StringBuilder sb = new StringBuilder();
    sb.Append('{');

    IEnumerator<string> enumerator = sequence.GetEnumerator();

    if (enumerator.MoveNext())
    {
        string a = enumerator.Current;
        if (!enumerator.MoveNext())
        {
            sb.Append(a);
        }
        else
        {
            string b = enumerator.Current;
            while (enumerator.MoveNext())
            {
                sb.Append(a);
                sb.Append(", ");
                a = b;
                b = enumerator.Current;
            }
            sb.AppendFormat("{0} and {1}", a, b);
        }
    }

    sb.Append('}');
    return sb.ToString();
}

Solution Using LINQ

public static string ConcatenateWithLinq(IEnumerable<string> sequence)
{
    return (from item in sequence select item)
        .Aggregate(
        new {sb = new StringBuilder("{"), a = (string) null, b = (string) null},
        (s, x) =>
            {
                if (s.a != null)
                {
                    s.sb.Append(s.a);
                    s.sb.Append(", ");
                }
                return new {s.sb, a = s.b, b = x};
            },
        (s) =>
            {
                if (s.b != null)
                    if (s.a != null)
                        s.sb.AppendFormat("{0} and {1}", s.a, s.b);
                    else
                        s.sb.Append(s.b);
                s.sb.Append("}");
                return s.sb.ToString();
            });
}

Solution with TPL

This solution uses a producer-consumer queue to feed the input sequence to the processor, whilst keeping at least two elements buffered in the queue. Once the producer has reached the end of the input sequence, the last two elements can be processed with special treatment.

In hindsight there is no reason to have the consumer operate asynchronously, which would eliminate the need for a concurrent queue, but as I said previously, I was just using this as an excuse to play around with new technologies :-)

public static string ConcatenateWithTpl(IEnumerable<string> sequence)
{
    var queue = new ConcurrentQueue<string>();
    bool stop = false;

    var consumer = Future.Create(
        () =>
            {
                var sb = new StringBuilder("{");
                while (!stop || queue.Count > 2)
                {
                    string s;
                    if (queue.Count > 2 && queue.TryDequeue(out s))
                        sb.AppendFormat("{0}, ", s);
                }
                return sb;
            });

    // Producer
    foreach (var item in sequence)
        queue.Enqueue(item);

    stop = true;
    StringBuilder result = consumer.Value;

    string a;
    string b;

    if (queue.TryDequeue(out a))
        if (queue.TryDequeue(out b))
            result.AppendFormat("{0} and {1}", a, b);
        else
            result.Append(a);

    result.Append("}");
    return result.ToString();
}

Unit tests elided for brevity.

Daniel Fortunov
+1  A: 

Here's mine, but I realize it's pretty much like Marc's, some minor differences in the order of things, and I added unit-tests as well.

using System;
using NUnit.Framework;
using NUnit.Framework.Extensions;
using System.Collections.Generic;
using System.Text;
using NUnit.Framework.SyntaxHelpers;

namespace StringChallengeProject
{
    [TestFixture]
    public class StringChallenge
    {
        [RowTest]
        [Row(new String[] { }, "{}")]
        [Row(new[] { "ABC" }, "{ABC}")]
        [Row(new[] { "ABC", "DEF" }, "{ABC and DEF}")]
        [Row(new[] { "ABC", "DEF", "G", "H" }, "{ABC, DEF, G and H}")]
        public void Test(String[] input, String expectedOutput)
        {
            Assert.That(FormatString(input), Is.EqualTo(expectedOutput));
        }

        //codesnippet:93458590-3182-11de-8c30-0800200c9a66
        public static String FormatString(IEnumerable<String> input)
        {
            if (input == null)
                return "{}";

            using (var iterator = input.GetEnumerator())
            {
                // Guard-clause for empty source
                if (!iterator.MoveNext())
                    return "{}";

                // Take care of first value
                var output = new StringBuilder();
                output.Append('{').Append(iterator.Current);

                // Grab next
                if (iterator.MoveNext())
                {
                    // Grab the next value, but don't process it
                    // we don't know whether to use comma or "and"
                    // until we've grabbed the next after it as well
                    String nextValue = iterator.Current;
                    while (iterator.MoveNext())
                    {
                        output.Append(", ");
                        output.Append(nextValue);

                        nextValue = iterator.Current;
                    }

                    output.Append(" and ");
                    output.Append(nextValue);
                }


                output.Append('}');
                return output.ToString();
            }
        }
    }
}
Lasse V. Karlsen
+1  A: 

I quite liked Jon's answer, but that's because it's much like how I approached the problem. Rather than specifically coding in the two variables, I implemented them inside of a FIFO queue.

It's strange because I just assumed that there would be 15 posts that all did exactly the same thing, but it looks like we were the only two to do it that way. Oh, looking at these answers, Marc Gravell's answer is quite close to the approach we used as well, but he's using two 'loops', rather than holding on to values.

But all those answers with LINQ and regex and joining arrays just seem like crazy-talk! :-)

Travis
A: 

I don't think that using a good old array is a restriction. Here is my version using an array and an extension method:

public static string CommaQuibbling(IEnumerable<string> list)
{
    string[] array = list.ToArray();

    if (array.Length == 0) return string.Empty.PutCurlyBraces();
    if (array.Length == 1) return array[0].PutCurlyBraces();

    string allExceptLast = string.Join(", ", array, 0, array.Length - 1);
    string theLast = array[array.Length - 1];

    return string.Format("{0} and {1}", allExceptLast, theLast)
                 .PutCurlyBraces();
}

public static string PutCurlyBraces(this string str)
{
    return "{" + str + "}";
}

I am using an array because of the string.Join method and because if the possibility of accessing the last element via an index. The extension method is here because of DRY.

I think that the performance penalities come from the list.ToArray() and string.Join calls, but all in one I hope that piece of code is pleasent to read and maintain.

Theo Lenndorff
A: 

I think Linq provides fairly readable code. This version handles a million "ABC" in .89 seconds:

using System.Collections.Generic;
using System.Linq;

namespace CommaQuibbling
{
    internal class Translator
    {
        public string Translate(IEnumerable<string> items)
        {
            return "{" + Join(items) + "}";
        }

        private static string Join(IEnumerable<string> items)
        {
            var leadingItems = LeadingItemsFrom(items);
            var lastItem = LastItemFrom(items);

            return JoinLeading(leadingItems) + lastItem;
        }

        private static IEnumerable<string> LeadingItemsFrom(IEnumerable<string> items)
        {
            return items.Reverse().Skip(1).Reverse();
        }

        private static string LastItemFrom(IEnumerable<string> items)
        {
            return items.LastOrDefault();
        }

        private static string JoinLeading(IEnumerable<string> items)
        {
            if (items.Any() == false) return "";

            return string.Join(", ", items.ToArray()) + " and ";
        }
    }
}
Thomas Eyde
I am curious, what are the reasons for the downvotes? What is it you don't like?
Thomas Eyde
+12  A: 

Inefficient, but I think clear.

public static string CommaQuibbling(IEnumerable<string> items)
{
    List<String> list = new List<string>(items);
    if (list.Count == 0) { return "{}"; }
    if (list.Count == 1) { return "{" + list[0] + "}"; }
    String[] initial = list.GetRange(0, list.Count - 1).ToArray();
    return "{" + String.Join(", ", initial) + " and " + list[list.Count - 1] + "}";
}

If I was maintaining the code, I'd prefer this to more clever versions.

dbkk
This is so much better than all the answers above it. I guess not many SO users read Eric's blog - he put emphasis on a readable solution. That's why this one is the best.
Paul Batum
A: 

You can use a foreach, without LINQ, delegates, closures, lists or arrays, and still have understandable code. Use a bool and a string, like so:

public static string CommaQuibbling(IEnumerable items)
{
    StringBuilder sb = new StringBuilder("{");
    bool empty = true;
    string prev = null;
    foreach (string s in items)
    {
     if (prev!=null)
     {
      if (!empty) sb.Append(", ");
      else empty = false;
      sb.Append(prev);
     }
     prev = s;
    }
    if (prev!=null)
    {
     if (!empty) sb.Append(" and ");
     sb.Append(prev);
    }
    return sb.Append('}').ToString();
}
A: 
public static string CommaQuibbling(IEnumerable<string> items)
{
   var itemArray = items.ToArray();

   var commaSeparated = String.Join(", ", itemArray, 0, Math.Max(itemArray.Length - 1, 0));
   if (commaSeparated.Length > 0) commaSeparated += " and ";

   return "{" + commaSeparated + itemArray.LastOrDefault() + "}";
}
Richard
+1  A: 

I'm a fan of the serial comma: I eat, shoot, and leave.

I continually need a solution to this problem and have solved it in 3 languages (though not C#). I would adapt the following solution (in Lua, does not wrap answer in curly braces) by writing a concat method that works on any IEnumerable:

function commafy(t, andword)
  andword = andword or 'and'
  local n = #t -- number of elements in the numeration
  if n == 1 then
    return t[1]
  elseif n == 2 then
    return concat { t[1], ' ', andword, ' ', t[2] }
  else
    local last = t[n]
    t[n] = andword .. ' ' .. t[n]
    local answer = concat(t, ', ')
    t[n] = last
    return answer
  end
end
Norman Ramsey
A: 

This isn't brilliantly readable, but it scales well up to tens of millions of strings. I'm developing on an old Pentium 4 workstation and it does 1,000,000 strings of average length 8 in about 350ms.

public static string CreateLippertString(IEnumerable<string> strings)
{
    char[] combinedString;
    char[] commaSeparator = new char[] { ',', ' ' };
    char[] andSeparator = new char[] { ' ', 'A', 'N', 'D', ' ' };

    int totalLength = 2;  //'{' and '}'
    int numEntries = 0;
    int currentEntry = 0;
    int currentPosition = 0;
    int secondToLast;
    int last;
    int commaLength= commaSeparator.Length;
    int andLength = andSeparator.Length;
    int cbComma = commaLength * sizeof(char);
    int cbAnd = andLength * sizeof(char);

    //calculate the sum of the lengths of the strings
    foreach (string s in strings)
    {
        totalLength += s.Length;
        ++numEntries;
    }

    //add to the total length the length of the constant characters
    if (numEntries >= 2)
        totalLength += 5;  // " AND "

    if (numEntries > 2)
        totalLength += (2 * (numEntries - 2)); // ", " between items

    //setup some meta-variables to help later
    secondToLast = numEntries - 2;
    last = numEntries - 1;

    //allocate the memory for the combined string
    combinedString = new char[totalLength];
    //set the first character to {
    combinedString[0] = '{';
    currentPosition = 1;

    if (numEntries > 0)
    {
        //now copy each string into its place
        foreach (string s in strings)
        {
            Buffer.BlockCopy(s.ToCharArray(), 0, combinedString, currentPosition * sizeof(char), s.Length * sizeof(char));
            currentPosition += s.Length;

            if (currentEntry == secondToLast)
            {
                Buffer.BlockCopy(andSeparator, 0, combinedString, currentPosition * sizeof(char), cbAnd);
                currentPosition += andLength;
            }
            else if (currentEntry == last)
            {
                combinedString[currentPosition] = '}'; //set the last character to '}'
                break;  //don't bother making that last call to the enumerator
            }
            else if (currentEntry < secondToLast)
            {
                Buffer.BlockCopy(commaSeparator, 0, combinedString, currentPosition * sizeof(char), cbComma);
                currentPosition += commaLength;
            }

            ++currentEntry;
        }
    }
    else
    {
        //set the last character to '}'
        combinedString[1] = '}';
    }

    return new string(combinedString);
}
+2  A: 

Another variant - separating punctuation and iteration logic for the sake of code clarity. And still thinking about perfomrance.

Works as requested with pure IEnumerable/string/ and strings in the list cannot be null.

public static string Concat(IEnumerable<string> strings)
{
    return "{" + strings.reduce("", (acc, prev, cur, next) => 
               acc.Append(punctuation(prev, cur, next)).Append(cur)) + "}";
}
private static string punctuation(string prev, string cur, string next)
{
 if (null == prev || null == cur)
  return "";
 if (null == next)
  return " and ";
 return ", ";
}

private static string reduce(this IEnumerable<string> strings, 
    string acc, Func<StringBuilder, string, string, string, StringBuilder> func)
{
 if (null == strings) return "";

 var accumulatorBuilder = new StringBuilder(acc);
 string cur = null;
 string prev = null;
 foreach (var next in strings)
 {
  func(accumulatorBuilder, prev, cur, next);
  prev = cur;
  cur = next;
 }
 func(accumulatorBuilder, prev, cur, null);

 return accumulatorBuilder.ToString();
}

F# surely looks much better:

let rec reduce list =
    match list with
    | []          -> ""
    | head::curr::[]  -> head + " and " + curr
    | head::curr::tail  -> head + ", " + curr :: tail |> reduce
    | head::[] -> head

let concat list = "{" + (list |> reduce )  + "}"
darkiri
A: 

Here as a Python one liner


>>> f=lambda s:"{%s}"%", ".join(s)[::-1].replace(',','dna ',1)[::-1]
>>> f([])
'{}'
>>> f(["ABC"])
'{ABC}'
>>> f(["ABC","DEF"])
'{ABC and DEF}'
>>> f(["ABC","DEF","G","H"])
'{ABC, DEF, G and H}'

This version might be easier to understand


>>> f=lambda s:"{%s}"%" and ".join(s).replace(' and',',',len(s)-2)
>>> f([])
'{}'
>>> f(["ABC"])
'{ABC}'
>>> f(["ABC","DEF"])
'{ABC and DEF}'
>>> f(["ABC","DEF","G","H"])
'{ABC, DEF, G and H}'
gnibbler
+1  A: 

Late entry:

public static string CommaQuibbling(IEnumerable<string> items)
{
    string[] parts = items.ToArray();
    StringBuilder result = new StringBuilder('{');
    for (int i = 0; i < parts.Count; i++)
    {
        if (i > 0)
            result.Append(i == parts.Count - 1 ? " and " : ", ");
        result.Append(parts[i]);
    }
    return result.Append('}').ToString();
}
Foole
A: 

Here's my submission. Modified the signature a bit to make it more generic. Using .NET 4 features (String.Join() using IEnumerable<T>), otherwise works with .NET 3.5. Goal was to use LINQ with drastically simplified logic.

static string CommaQuibbling<T>(IEnumerable<T> items)
{
    int count = items.Count();
    var quibbled = items.Select((Item, index) => new { Item, Group = (count - index - 2) > 0})
                        .GroupBy(item => item.Group, item => item.Item)
                        .Select(g => g.Key
                            ? String.Join(", ", g)
                            : String.Join(" and ", g));
    return "{" + String.Join(", ", quibbled) + "}";
}
Jeff M
A: 

There's a couple non-C# answers, and the original post did ask for answers in any language, so I thought I'd show another way to do it that none of the C# programmers seems to have touched upon: a DSL!

(defun quibble-comma (words)
  (format nil "~{~#[~;~a~;~a and ~a~:;~@{~a~#[~; and ~:;, ~]~}~]~}" words))

The astute will note that Common Lisp doesn't really have an IEnumerable<T> built-in, and hence FORMAT here will only work on a proper list. But if you made an IEnumerable, you certainly could extend FORMAT to work on that, as well. (Does Clojure have this?)

Also, anyone reading this who has taste (including Lisp programmers!) will probably be offended by the literal "~{~#[~;~a~;~a and ~a~:;~@{~a~#[~; and ~:;, ~]~}~]~}" there. I won't claim that FORMAT implements a good DSL, but I do believe that it is tremendously useful to have some powerful DSL for putting strings together. Regex is a powerful DSL for tearing strings apart, and string.Format is a DSL (kind of) for putting strings together but it's stupidly weak.

I think everybody writes these kind of things all the time. Why the heck isn't there some built-in universal tasteful DSL for this yet? I think the closest we have is "Perl", maybe.

Ken