views:

168

answers:

9

I have an array string[] with values that are mostly convertible to integers.

var values = new [] {"", "1", "2", "a", "3"};

I need to convert values to an array of integers, discarding any items that aren't convertible. So I end up with

var numbers = new [] {1, 2, 3};

What would be the most efficient (quickest and clean code) way to do this?

+1  A: 

You could try the following

public static IEnumerable<int> Convert(this IEnumerable<string> enumerable) {
  Func<string,int?> convertFunc = x => {
    int value ;
    bool ret = Int32.TryParse(x, out value);
    return ret ? (int?)value : null;
  };
  return enumerable
    .Select(convertFunc)
    .Where(x => x.HasValue)
    .Select(x => x.Value);
}

This can easily then be converted to an array.

var numbers = values.Convert().ToArray();
JaredPar
Why not just perform the ToArray() conversion within the return statement and have the static function return int[]?
Joel Etherton
@Joel - It does add flexibility to the method.
ChaosPandion
+4  A: 
var numbers = values.Select(
    s => {
        int n;
        if (!int.TryParse((s ?? string.Empty), out n)) 
        {
            return (int?)null;
        }
        return (int?)n;
    }
)
.Where(n => n != null)
.Select(n => n.Value)
.ToArray();
ChaosPandion
That returns an Array of `int?` and not `int`
JaredPar
@JaredPar - Fixed.
ChaosPandion
@ChaosPandion, the return could be `return !int.TryParse((s ?? string.Empty).Trim(), out n) ? (int?)null : (int?)n;` instead... not necessarily cleaner, but one line. I like how this only parses once
Chad
@ChaosPandion: I don't think the `Trim` call is necessary, is it? I seem to recall that `int.TryParse` ignores whitespace on either side of the string.
Dan Tao
@Dan - You are right, this is straight from MSDN `[ws][sign]digits[ws]`.
ChaosPandion
+1  A: 

With some straightforward LINQ:

var numbers = values.Where(x => { int num = 0; return Int32.TryParse(x, out num); })
                    .Select(num => Int32.Parse(num));

Notably this converts every string twice. Doing this imperatively you'll lose some clarity but gain some speed (as an IEnumerable extension):

public static IEnumerable<int> TryCastToInt<T>(this IEnumerable<T> values)
  int num = 0;
  foreach (object item in values) {
    if (Int32.TryParse(item.ToString(), num)) {
      yield return num;
    }
  }
}
Ron Warholic
I wouldn't do this because you have to parse the integer twice.
ChaosPandion
@Ron Warholic, that works, just got to add `.ToArray()` to the end
Chad
I agree with @Chaos. Parsing the integer is the most expensive part of this operation and you're intentionally doing it twice.
JaredPar
+2  A: 

EDIT: Updated to not use try/catch, since StackOverflow users have pointed out that it's slow.

Try this.

var values = new[] { "", "1", "2", "a", "3" };
List<int> numeric_list = new List();
int num_try = 0;
foreach (string string_value in values)
{
    if (Int32.TryParse(string_value, out num_try) {
        numeric_list.Add(num_try);
    }

    /* BAD PRACTICE (as noted by other StackOverflow users)
    try
    {
        numeric_list.Add(Convert.ToInt32(string_value));
    }
    catch (Exception)
    {
        // Do nothing, since we want to skip.
    }
    */
}

return numeric_list.ToArray();
Helgi Hrafn Gunnarsson
This is not the proper way to do this. Using exceptions as control flow is bad practice.
ChaosPandion
Try-catch in case of errors (and we are sure to have them) slows down performances too much.
digEmAll
Yeah, it has always bothered me too, frankly. Is there a proper way to ignore exceptions?
Helgi Hrafn Gunnarsson
+1 now it's my favourite solution, shorter (and maybe quicker) than any other LINQ solution
digEmAll
@digEmAll: This is very unlikely to be quicker than a solution such as Jared's or mine, since it requires the construction of a throwaway `List<int>` (at least I *think* that's what Helgi meant to write) in memory. If this code were abstracted into its own extension method, it would significantly hinder the performance of any lazily-evaluated code to which it were chained. Not trying to be a naysayer -- it's a fine solution -- just disputing your claim as to its quickness.
Dan Tao
Ok, i've tested it in release mode with 20.000.000 numbers to parse and that's the result (in ticks):[ LOOP: 11572190 | LINQ (DAN PAO): 13959216 | LINQ (JARED): 15543572 ].Also, I've tested roughly the memory usage of each method (both through GC.GetTotalMemory and brutely increasing the iteration number until OutOfMemoryException) and the memory usage is very similar for loop and LINQ methods (slightly less for loop). IMO LINQ internally instantiate a throwaway buffer as well.
digEmAll
@Dan Tao: How would you do this without Linq and without making the List<int>? I'm curious to know because I'd rather not create it, but I don't know how to do it otherwise, seeing that you have to know the size of an array beforehand.
Helgi Hrafn Gunnarsson
@digEmAll: I am skeptical! What did you test, the enumeration alone or the enumeration plus the `ToArray` call? In the latter case, it isn't a fair fight since yes of course the `List<T>` already knows its count and can allocate an array once, whereas `Enumerable.ToArray` needs to use an internal resizeable buffer. But for an `IEnumerable<T>` implementation to *assume* it's going to be used to populate an array is wasteful, in my view. I am going to compare the enumeration itself and will be shocked if I get the same results as you. (But if I do, of course: thanks for the lesson!)
Dan Tao
@Helgi: See my answer. Using the `yield` statement provides deferred evaluation over an `IEnumerable<T>` implementation.
Dan Tao
@Dan Tao: (1st of all sorry for the PAO in my prev comment) Obviously I've tested the ToArray version, because actually the questioner ask for an array. Otherwise, we could remove the ToArray() also from the loop version and return directly the (now nomore throwaway) list. Anyway, yes very likely for the enumeration alone you're right.
digEmAll
@digEmAll: Well I'll be damned -- looks like you're right! What I don't understand is how the memory usage could be the same. I can't even populate a `List<int>` with 20,000,000 values on my machine; I get an `OutOfMemoryException`. Yet I can do it just fine using `yield`. This is leaving me scratching my head...
Dan Tao
@Dan Tao: Maybe this could be disscussed more widely in another question...
digEmAll
A: 
int n;
var values = new[] { "", "1", "2", "a", "3" };
var intsonly = values.Where (v=> Int32.TryParse(v, out n)).Select(x => Int32.Parse(x));
Chris McCall
+3  A: 

This can be done in a single linq statement without having to do the parse twice:

var numbers = values
    .Select(c => { int i; return int.TryParse(c, out i) ? i : (int?)null; })
    .Where(c => c.HasValue)
    .Select(c => c.Value)
    .ToArray();
Daniel Renshaw
+1 Answered the OP's question elegantly.
Roy Tinker
+3  A: 

I personally use an extension method that's a little different from what others have posted so far. It specifically uses a custom WeakConverter<TSource, TResult> delegate to avoid the issue in Ron's answer of calling ToString() on every object, while maintaining genericity unlike Jared's answer (though I will concede that sometimes trying to make everything generic is overdoing it -- but in this case, the added effort is really not much for what I consider a significant benefit in terms of reusability).

public delegate bool WeakConverter<TSource, TResult>(TSource source, out TResult result);

public static IEnumerable<TResult> TryConvertAll<TSource, TResult>(this IEnumerable<TSource> source, WeakConverter<TSource, TResult> converter)
{
    foreach (TSource original in source)
    {
        TResult converted;
        if (converter(original, out converted))
        {
            yield return converted;
        }
    }
}

With this in place, you can convert a string[] to an int[] quite simply and robustly (no double-parsing necessary):

string[] strings = new[] { "1", "2", "abc", "3", "", "123" };

int[] ints = strings.TryConvertAll<string, int>(int.TryParse).ToArray();

foreach (int x in ints)
{
    Console.WriteLine(x);
}

Output:

1
2
3
123
Dan Tao
I like this but the name might need to be reworked. It is true that this attempts to convert everything but it also filters out the failed attempts.
ChaosPandion
@ChaosPandion: Yeah, I suppose the "Try" part of the name could be interpreted in more than one way. I considered something like `ConvertWherePossible`, but I must admit I have a bias against long method names.
Dan Tao
+4  A: 

Here is my take on this. It uses a separate method but cleanly expresses the conditional parse without using nullable values:

private static IEnumerable<int> ParseInt32s(IEnumerable<string> value)
{
    foreach(var value in values)
    {
        int n;

        if(Int32.TryParse(value, out n))
        {
            yield return n;
        }
    }
}

Usage:

string[] values;

var parsedValues = ParseInt32s(values).ToArray();
Bryan Watts
I like this answer best. You have to really squint to not see what it's doing.
Jeffrey L Whitledge
+1 This is a problem where I don't think LINQ is the best answer.
Greg
A: 
var numbers = values
    .Where(x => !String.IsNullOrEmpty(x))
    .Where(x => x.All(Char.IsDigit))
    .Select(x => Convert.ToInt32(x))
    .ToArray();
Handcraftsman
Elegant expression, but unlike the accepted answer, this doesn't work if there is a sign character. It also doesn't ignore whitespace.
Roy Tinker