views:

559

answers:

9

Say you have an array of int (in any language with fixed size ints). How would you calculate the int closest to their mean?

Edit: to be clear, the result does not have to be present in the array. That is, for the input array [3, 6, 7] the expected result is 5. Also I guess we need to specify a particular rounding direction, so say round down if you are equally close to two numbers.

Edit: This is not homework. I haven't had homework in five years. And this is my first time on stackoverflow, so please be nice!

Edit: The obvious approach of summing up and dividing may overflow, so I'm trying to think of an approach that is overflow safe, for both large arrays and large ints. I think handling overflow correctly (without cheating and using a different type) is by far the hardest part of this problem.

+2  A: 

Calculate the sum by adding the numbers up, and dividing by the number of them, with rounding:

mean = (int)((sum + length/2) / length;

If you are worried about overflow, you can do something like:

int mean = 0, remainder = 0
foreach n in number
   mean += n / length
   remainder += n % length
   if remainder > length
       mean += 1
       remainder -= length
if remainder > length/2
   mean += 1
print "mean is: " mean

note that this isn't very fast.

FryGuy
This is a good start, but that remainder += n%length still may overflow for large lengths. For example, if length is INT_MAX/2 + 2 and the first two entries in the array are INT_MAX/2 + 1, we'll overflow.
fish
true that could happen, but on what architecture could you have in memory an array that size :).. also, it doesn't handle negative numbers well.
FryGuy
+2  A: 

um... how about just calculating the mean and then rounding to an integer? round(mean(thearray)) Most languages have facilities that allow you to specify the rounding method.

EDIT: So it turns out that this question is really about avoiding overflow, not about rounding. Let me be clear that I agree with those that have said (in the comments) that it's not something to worry about in practice, since it so rarely happens, and when it does you can always get away with using a larger data type.

I see that several other people have given answers that basically consist of dividing each number in the array by the count of the array, then adding them up. That is also a good approach. But just for kicks, here's an alternative (in C-ish pseudocode):

int sum_offset = 0;
for (int i = 1; i < length(array); i++)
     sum_offset += array[i] - array[i-1];
// round by your method of choice
int mean_offset = round((float)sum_offset / length(array));
int mean = mean_offset + array[0];

Or another way to do the same thing:

int min = INT_MAX, max = INT_MIN;
for (int i = 0; i < length(array); i++) {
     if (array[i] < min) min = array[i];
     if (array[i] > max) max = array[i];
}
int sum_offset = max - min;
// round by your method of choice
int mean_offset = round((float)sum_offset / length(array));
int mean = mean_offset + min;

Of course, you need to make sure sum_offset does not overflow, which can happen if the difference between the largest and smallest array elements is larger than INT_MAX. In that case, replace the last four lines with something like this:

// round by your method of choice
int mean_offset = round((float)max / length(array) - (float)min / length(array));
int mean = mean_offset + min;

Trivia: this method, or something like it, also works quite well for mentally computing the mean of an array whose elements are clustered close together.

David Zaslavsky
How would you calculate the mean?
fish
@fish - I'm not David but personally, I'd calculate the mean by adding all the numbers and then dividing the result by their count, i.e., by how many numbers there are. This is usually a quite effective way to calculate the mean.
Daniel Daranas
;-) I'm with Daniel
David Zaslavsky
A: 

Welcome. fish, hope your stay is a pleasant one.

The following pseudo-code shows how to do this in the case where the sum will fit within an integer type, and round rounds to the nearest integer.

In your sample, the numbers add sum to 16, dividing by 3 gives you 5 1/3, which rounds to 5.

sum = 0
for i = 1 to array.size
    sum = sum + array[i]
sum = sum / array.size
sum = round (sum)
paxdiablo
A: 

Pseudocode for getting the average:

double mean = 0
int count = 0
foreach int number in numbers
    count++
    mean += number - mean / count

round(mean) // rounds up
floor(mean + 0.5) // rounds up
ceil(mean - 0.5) // rounds down

Rounding generally involves adding 0.5, then truncating (floor), which is why 3.5 rounds up to 4. If you want 3.5 to round down to 3, do the rounding code yourself, but in reverse: subtract 0.5, then find the ceiling.

Edit: Updated requirements (no overflow)

Nikhil Chelliah
+1  A: 

Here is LINQ/C# solution:

ar.Aggregate((a, b) => a + b) / ar.Length
ssg
A: 

This pseudocode finds the average and covers the problem of overflow:

double avg = 0
int count = 0
for x in array:
    count += 1
    avg = avg * (count - 1) / count   // readjust old average
    avg += x / count                  // add in new number

After that, you can apply your rounding code. If there is no easy way to round in your language, then something like this will work (rounds up when over .5):

int temp = avg - int(avg)   // finds decimal portion
if temp <= 0.5
    avg = int(avg)          // round down
else
    avg = int(avg) + 1      // round up
MahlerFive
+2  A: 

Here's a way that's fast, reasonably overflow-safe and can work when the number of elements isn't known in advance.

// The length of someListOfNumbers doesn't need to be known in advance.
int mean(SomeType someListOfNumbers) {  
    double mean = 0, count = 0;
    foreach(element; someListOfNumbers) {
        count++;
        mean += (element - mean) / count;
    }
    if(count == 0) {
        throw new UserIsAnIdiotException(
                  "Problem exists between keyboard and chair.");
    }
    return cast(int) floor(mean);
}
dsimcha
+1  A: 

Guaranteed not to overflow:

length ← length of list
average ← 0
for each result in the list do:
    average ← average + ( result / length )
end for

This has significant problems with accuracy if you're using ints due to truncation (the average of six 4's comes out as 0)

eplawless
Does not overflow because you're using floating-point, or you're giving inaccurate answers (most of the time).
strager
A: 

ARM assembly. =] Untested. Won't overflow. Ever. (I hope.)

Can probably be optimized a bit. (Maybe use FP/LR?) =S Maybe THUMB will work better here.

.arm

; r0 = pointer to array of integers
; r1 = number of integers in array
; returns mean in r0
mean:
stmfd sp!, {r4,r5}
mov r5, r1

mov r2, 0          ; sum_lo
mov r3, 0          ; sum_hi

cmp r1, 0          ; Check for empty array
bz .end

.loop:
ldr r4, [r0], #4
add r2, r2, r4
adc r3, r3, #0     ; Handle overflow
sub r1, r1, #1     ; Next
bnz .loop

.end:
div r0, r2, r3, r5 ; Your own 64-bit/32-bit divide: r0 = (r3r2) / r5
bx lr
strager