views:

347

answers:

12

.NET Framework 3.5.
I'm trying to calculate the average of some pretty large numbers.
For instance:

using System;
using System.Linq;

class Program
{
    static void Main(string[] args)
    {
        var items = new long[]
                        {
                            long.MaxValue - 100, 
                            long.MaxValue - 200, 
                            long.MaxValue - 300
                        };
        try
        {
            var avg = items.Average();
            Console.WriteLine(avg);
        }
        catch (OverflowException ex)
        {
            Console.WriteLine("can't calculate that!");
        }
        Console.ReadLine();
    }
}

Obviously, the mathematical result is 9223372036854775607 (long.MaxValue - 200), but I get an exception there. This is because the implementation (on my machine) to the Average extension method, as inspected by .NET Reflector is:

public static double Average(this IEnumerable<long> source)
{
    if (source == null)
    {
        throw Error.ArgumentNull("source");
    }
    long num = 0L;
    long num2 = 0L;
    foreach (long num3 in source)
    {
        num += num3;
        num2 += 1L;
    }
    if (num2 <= 0L)
    {
        throw Error.NoElements();
    }
    return (((double) num) / ((double) num2));
}

I know I can use a BigInt library (yes, I know that it is included in .NET Framework 4.0, but I'm tied to 3.5).

But I still wonder if there's a pretty straight forward implementation of calculating the average of integers without an external library. Do you happen to know about such implementation?

Thanks!!


UPDATE:

The previous example, of three large integers, was just an example to illustrate the overflow issue. The question is about calculating an average of any set of numbers which might sum to a large number that exceeds the type's max value. Sorry about this confusion. I also changed the question's title to avoid additional confusion.

Thanks all!!

A: 

How about BigInteger in Visual J#.

Darin Dimitrov
+1  A: 

If you know in advance that all your numbers are going to be 'big' (in the sense of 'much nearer long.MaxValue than zero), you can calculate the average of their distance from long.MaxValue, then the average of the numbers is long.MaxValue less that.

However, this approach will fail if (m)any of the numbers are far from long.MaxValue, so it's horses for courses...

AakashM
This is about the same as my approach, but yours will fail for any negative number.
Tomas Lycken
A: 

If you're willing to sacrifice precision, you could do something like:

long num2 = 0L;
foreach (long num3 in source)
{
    num2 += 1L;
}
if (num2 <= 0L)
{
    throw Error.NoElements();
}
double average = 0;
foreach (long num3 in source)
{
    average += (double)num3 / (double)num2;
}
return average;
Andreas Brinck
+2  A: 

If you know approximately what the average will be (or, at least, that all pairs of numbers will have a max difference < long.MaxValue), you can calculate the average difference from that value instead. I take an example with low numbers, but it works equally well with large ones.

// Let's say numbers cannot exceed 40.
List<int> numbers = new List<int>() { 31 28 24 32 36 29 }; // Average: 30

List<int> diffs = new List<int>();

// This can probably be done more effectively in linq, but to show the idea:
foreach(int number in numbers.Skip(1))
{
    diffs.Add(numbers.First()-number);
}
// diffs now contains { -3 -6 1 5 -2 }

var avgDiff = diffs.Sum() / diffs.Count(); // the average is -1

// To get the average value, just add the average diff to the first value:
var totalAverage = numbers.First()+avgDiff;

You can of course implement this in some way that makes it easier to reuse, for example as an extension method to IEnumerable<long>.

Tomas Lycken
If you're unlucky to have a list {long.MaxValue, long.MinValue+100, ... }, it still goes awry. But your idea seems nice.
ANeves
@ANeves - for this to work I explicitly assumed that no two numbers should be longer than long.MaxValue apart.
Tomas Lycken
A: 

Perhaps you can reduce every item by calculating average of adjusted values and then multiply it by the number of elements in collection. However, you'll find a bit different number of of operations on floating point.

var items = new long[] { long.MaxValue - 100, long.MaxValue - 200, long.MaxValue - 300 };
var avg = items.Average(i => i / items.Count()) * items.Count();
Andrei Taptunov
A: 

You could keep a rolling average which you update once for each large number.

filip-fku
+9  A: 

If you're just looking for an arithmetic mean, you can perform the calculation like this:

public static double Mean(this IEnumerable<long> source)
{
    if (source == null)
    {
        throw Error.ArgumentNull("source");
    }

    double count = (double)source.Count();
    double mean = 0D;

    foreach(long x in source)
    {
        mean += (double)x/count;
    }

    return mean;
}

Edit:

In response to comments, there definitely is a loss of precision this way, due to performing numerous divisions and additions. For the values indicated by the question, this should not be a problem, but it should be a consideration.

Programming Hero
Excellent answer - minimal loss of precision, minimal chance of overflow, and gets the right answer! +1 from me... However: `IEnumerable` doesn't have a `.Count()`, so you should maybe correct your parameter type (or make explicit that you're using Linq). Oh, and nice avatar ;)
Dan Puzey
@Dan, `IEnumerable` *does* have a `.Count()`, given that you include a `using` statement for `System.Linq`.
Tomas Lycken
If `count` is very large, and the elements are small, the loss of precision might not be negligible. The more elements you have and the smaller they are, the worse this performs...
Aviad P.
@Tomas - fair point - I missed the `using` in the OP. He's already had my +1 anyway ;-)
Dan Puzey
A: 

Use the IntX library on CodePlex.

leppie
+2  A: 

Here is how I would do if given this problem. First let's define very simple RationalNumber class, which contains two properties - Dividend and Divisor and an operator for adding two complex numbers. Here is how it looks:

public sealed class RationalNumber
{
    public RationalNumber()
    {
        this.Divisor = 1;
    }


    public static RationalNumberoperator +( RationalNumberc1, RationalNumber c2 )
    {
        RationalNumber result = new RationalNumber();

        Int64 nDividend = ( c1.Dividend * c2.Divisor ) + ( c2.Dividend * c1.Divisor );
        Int64 nDivisor = c1.Divisor * c2.Divisor;
        Int64 nReminder = nDividend % nDivisor;

        if ( nReminder == 0 )
        {
            // The number is whole
            result.Dividend = nDividend / nDivisor;
        }
        else
        {
            Int64 nGreatestCommonDivisor = FindGreatestCommonDivisor( nDividend, nDivisor );

            if ( nGreatestCommonDivisor != 0 )
            {
                nDividend = nDividend / nGreatestCommonDivisor;
                nDivisor = nDivisor / nGreatestCommonDivisor;
            }

            result.Dividend = nDividend;
            result.Divisor = nDivisor;
        }

            return result;
    }


    private static Int64 FindGreatestCommonDivisor( Int64 a, Int64 b)
    {
        Int64 nRemainder;

        while ( b != 0 )
        {
            nRemainder = a% b;
            a = b;
            b = nRemainder;
        }

        return a;
    }


    // a / b = a is devidend, b is devisor
    public Int64 Dividend   { get; set; }
    public Int64 Divisor    { get; set; }
}

Second part is really easy. Let's say we have an array of numbers. Their average is estimated by Sum(Numbers)/Length(Numbers), which is the same as Number[ 0 ] / Length + Number[ 1 ] / Length + ... + Number[ n ] / Length. For to be able to calculate this we will represent each Number[ i ] / Length as a whole number and a rational part ( reminder ). Here is how it looks:

Int64[] aValues = new Int64[] { long.MaxValue - 100, long.MaxValue - 200, long.MaxValue - 300 };

List<RationalNumber> list = new List<RationalNumber>();
Int64 nAverage = 0;

for ( Int32 i = 0; i < aValues.Length; ++i )
{
    Int64 nReminder = aValues[ i ] % aValues.Length;
    Int64 nWhole = aValues[ i ] / aValues.Length;

    nAverage += nWhole;

    if ( nReminder != 0 )
    {
        list.Add( new RationalNumber() { Dividend = nReminder, Divisor = aValues.Length } );
    }
}

RationalNumber rationalTotal = new RationalNumber();

foreach ( var rational in list )
{
    rationalTotal += rational;
}

nAverage = nAverage + ( rationalTotal.Dividend / rationalTotal.Divisor );

At the end we have a list of rational numbers, and a whole number which we sum together and get the average of the sequence without an overflow. Same approach can be taken for any type without an overflow for it, and there is no lost of precision.

EDIT:

Why this works:

Define: A set of numbers.

if Average( A ) = SUM( A ) / LEN( A ) =>

Average( A ) = A[ 0 ] / LEN( A ) + A[ 1 ] / LEN( A ) + A[ 2 ] / LEN( A ) + ..... + A[ N ] / LEN( 2 ) =>

if we define An to be a number that satisfies this: An = X + ( Y / LEN( A ) ), which is essentially so because if you divide A by B we get X with a reminder a rational number ( Y / B ).

=> so

Average( A ) = A1 + A2 + A3 + ... + AN = X1 + X2 + X3 + X4 + ... + Reminder1 + Reminder2 + ...;

Sum the whole parts, and sum the reminders by keeping them in rational number form. In the end we get one whole number and one rational, which summed together gives Average( A ). Depending on what precision you'd like, you apply this only to the rational number at the end.

Ivan Zlatanov
You are using misleading names (`ComplexNumber`? where's the real and imaginary parts?! - you probably meant `RationalNumber` - `left` and `right` for a GCD function?!). You are using modulos, divisions and the GCD algorithm during addition so I don't understand how this is faster than @Programming Hero's solution. You're not exactly clear about how and why it works either. -1.
IVlad
I take your criticism and will update my answer. I rechecked my code for testing the speed. My mistake. I will correct my comment.
Ivan Zlatanov
+2  A: 

Simple answer with LINQ...

var data = new[] { int.MaxValue, int.MaxValue, int.MaxValue };
var mean = (int)data.Select(d => (double)d / data.Count()).Sum();

Depending on the size of the set fo data you may want to force data .ToList() or .ToArray() before your process this method so it can't requery count on each pass. (Or you can call it before the .Select(..).Sum().)

Matthew Whited
+3  A: 

You may try the following approach:

let number of elements is N, and numbers are arr[0], .., arr[N-1].

You need to define 2 variables:

mean and remainder.

initially mean = 0, remainder = 0.

at step i you need to change mean and remainder in the following way:

mean += arr[i] / N;
remainder += arr[i] % N;
mean += remainder / N;
remainder %= N;

after N steps you will get correct answer in mean variable and remainder / N will be fractional part of the answer (I am not sure you need it, but anyway)

Miollnyr
+12  A: 

You can compute this almost exactly, without resorting to a larger integer type or big ints. The trick is to keep the remainders separate from the quotients.

public static double Mean(this IEnumerable<Int64> values) {
    Int32 n = values.Count();
    if (n == 0) return 0;
    Int64 q = 0;
    Int64 r = 0;
    foreach (var v in values) {
        q += v / n;
        r += v % n;
    }
    q += r / n;
    r = r % n;
    return q + r / (double)n;
}

This should work as long as there are fewer than Int32.MaxValue items in the collection. Moving the 'q += r/n, r = r % n' lines inside the loop should make it work for even larger collections at the cost of some speed.

You can generalize this to work online (always knowing the average for the values seen so far), using the avg[n+1] = (avg[n]*n+v)/(n+1) formula, but it is much more complicated code.

Strilanc
+1 for more accuracy while handling any values within the Int64 range and concise code
DanK