views:

528

answers:

3

I think the best way to explain my question is with a short (generic) linq-to-objects code sample:

IEnumerable<string> ReadLines(string filename)
{
    string line;
    using (var rdr = new StreamReader(filename))
        while ( (line = rdr.ReadLine()) != null)
           yield return line;
}

IEnumerable<int> XValuesFromFile(string filename)
{
    return ReadLines(filename)
               .Select(l => l.Substring(3,3))
               .Where(l => int.TryParse(l))
               .Select(i => int.Parse(i));
}

Notice that this code parses the integer twice. I know I'm missing an obvious simple way to eliminate one of those calls safely (namely because I've done it before). I just can't find it right now. How can I do this?

+7  A: 

How about:

int? TryParse(string s)
{
    int i;
    return int.TryParse(s, out i) ? (int?)i : (int?)null;
}
IEnumerable<int> XValuesFromFile(string filename)
{
    return from line in ReadLines(filename)
           let start = line.Substring(3,3)
           let parsed = TryParse(start)
           where parsed != null
           select parsed.GetValueOrDefault();
}

You could probably combine the second/third lines if you like:

    return from line in ReadLines(filename)
           let parsed = TryParse(line.Substring(3,3))

The choice of GetValueOrDefault is because this skips the validation check that casting (int) or .Value perform - i.e. it is (ever-so-slightly) faster (and we've already checked that it isn't null).

Marc Gravell
I guess I'm looking more for the generic case of filtering an enumerable based on a complex transformation - keep the changed version of everything that passed the change. This may be case for writing a new "operator".
Joel Coehoorn
Is using `!= null` and `GetValueOrDefault()` really faster than using `where parsed.HasValue` and `select parsed.Value`? I guess I should go run some tests, because that seems counter-intuitive to me.
Joel Mueller
Another approach is to write a method that returns a `Tuple<bool, T>` result, if you've got .NET 4 or want to write your own Tuple class. This is how F# automatically handles TryParse and similar methods. Then the LINQ would be `where tuple.Item1 select tuple.Item2`
Joel Mueller
Joel Mueller: yeah, I was already working on something kinda like that :)
Joel Coehoorn
@Joel Mueller - `!=null` is **exactly** `HasValue`, so that is no different. The `GetValueOrDefault()` is a *tiny* bit faster by skipping the check - it simply returns the inner field directly.
Marc Gravell
+1  A: 

It's not exactly pretty, but you can do:

return ReadLines(filename)
    .Select(l =>
                {
                    string tmp = l.Substring(3, 3);
                    int result;
                    bool success = int.TryParse(tmp, out result);
                    return new
                               {
                                   Success = success,
                                   Value = result
                               };
                })
    .Where(i => i.Success)
    .Select(i => i.Value);

Granted, this is mostly just pushing the work into the lambda, but it does provide the correct answers, with a single parse (but extra memory allocations).

Reed Copsey
Marc's option of using a Nullable<int> could be used here instead of the anonymous class, as well, which would prevent the GC pressure from occurring...
Reed Copsey
+3  A: 

I think I'll go with something like this:

IEnumerable<O> Reduce<I,O>(this IEnumerable<I> source, Func<I,Tuple<bool, O>> transform )
{
    foreach (var item in source)
    {
       try
       {
          Result<O> r = transform(item);
          if (r.success) yield return r.value;
       }
       catch {}
    }
}

ReadLines().Reduce(l => { var i; new Tuple<bool, int>(int.TryParse(l.Substring(3,3),i), i)} );

I don't really like this, though, as I'm already on the record as not liking using tuples in this way. Unfortunately, I don't see many alternatives outside of abusing exceptions or restricting it to reference types (where null is defined as a failed conversion), neither of which is much better.

Joel Coehoorn
I looked at this approach. I just didn't like the fact that the compiler can't infer the type (at least in C# 3), so the "Reduce" extension usability suffers...
Reed Copsey
My main complaints are **1)** that I can't express the conversion in a single statement. I still need a variable declaration inside the lambda. and **2)** that I have to express the result in form a tuple rather than the converted item.
Joel Coehoorn