views:

780

answers:

11

You have an ascending list of numbers, what is the most efficient algorithm you can think of to get the ascending list of sums of every two numbers in that list. Duplicates in the resulting list are irrelevant, you can remove them or avoid them if you like.

To be clear, I'm interested in the algorithm. Feel free to post code in any language and paradigm that you like.

A: 

If you are looking for a truly language agnostic solution then you will be sorely disappointed in my opinion because you'll be stuck with a for loop and some conditionals. However if you opened it up to functional languages or functional language features (I'm looking at you LINQ) then my colleagues here can fill this page with elegant examples in Ruby, Lisp, Erlang, and others.

John Downey
A: 

The best I could come up with is to produce a matrix of sums of each pair, and then merge the rows together, a-la merge sort. I feel like I'm missing some simple insight that will reveal a much more efficient solution.

My algorithm, in Haskell:

matrixOfSums list = [[a+b | b <- list, b >= a] | a <- list]

sortedSums = foldl merge [] matrixOfSums

--A normal merge, save that we remove duplicates
merge xs [] = xs
merge [] ys = ys
merge (x:xs) (y:ys) = case compare x y of
LT -> x:(merge xs (y:ys))
EQ -> x:(merge xs (dropWhile (==x) ys))
GT -> y:(merge (x:xs) ys)

I found a minor improvement, one that's more amenable to lazy stream-based coding. Instead of merging the columns pair-wise, merge all of them at once. The advantage being that you start getting elements of the list immediately.

-- wide-merge does a standard merge (ala merge-sort) across an arbitrary number of lists
-- wideNubMerge does this while eliminating duplicates
wideNubMerge :: Ord a => [[a]] -> [a]
wideNubMerge ls = wideNubMerge1 $ filter (/= []) ls
wideNubMerge1 [] = []
wideNubMerge1 ls = mini:(wideNubMerge rest)
where mini = minimum $ map head ls
rest = map (dropWhile (== mini)) ls

betterSortedSums = wideNubMerge matrixOfSums

However, if you know you're going to use all of the sums, and there's no advantage to getting some of them earlier, go with 'foldl merge []', as it's faster.

Peter Burns
I think (my haskell is rusty) this is O(N^3) since there are O(n) comparisons done for each thing in the result.
Paul Hankin
+2  A: 

Rather than coding this out, I figure I'll pseudo-code it in steps and explain my logic, so that better programmers can poke holes in my logic if necessary.

On the first step we start out with a list of numbers length n. For each number we need to create a list of lenght n-1 becuase we aren't adding a number to itself. By the end we have a list of about n sorted lists that was generated in O(n^2) time.

step1(startinglist) 
for each number num1 in startinglist
   for each number num2 in startinglist
      add num1 plus num2 into templist
   add templist to sumlist
return sumlist

In step 2 because the lists were sorted by design (add a number to each element in a sorted list and the list will still be sorted) we can simply do a mergesort by merging each list together rather than mergesorting the whole lot. In the end this should take O(n^2) time.

step2(sumlist) 
create an empty list mergedlist
for each list templist in sumlist
   set mergelist equal to: merge(mergedlist,templist)
return mergedlist

The merge method would be then the normal merge step with a check to make sure that there are no duplicate sums. I won't write this out because anyone can look up mergesort.

So there's my solution. The entire algorithm is O(n^2) time. Feel free to point out any mistakes or improvements.

deuseldorf
I think this is O(N^3) since there are n comparisons at each stage in step 2.
Paul Hankin
A: 

@deuseldorf

I'm skeptical that your algorithm runs in n^2 + n!, since your algorithm looks more or less identical to mine, which I think is O(n^2). Creating sumlist (matrixOfSums) takes n^2, and then there's n-1 merges, each of which takes linear time over the length of the resulting list. The merges produce lists of lengths: 2n + 3n + ... + nn which is in O(n^2) right?

Also, I ran my program over a list of ~28 thousand elements, and it terminated in ~25 seconds, which is a while, but not as long as I'd expect if it was in O(n!)

Perhaps I'm mistaken, complexity analysis was never one of my stronger suits.

Peter Burns
A: 

@rictic

You're right. My math was wrong. (2n+3n+4n+...nn)=(2+3+4+..n)n= (.5n(n+1))n=.5n^3 +.5n^2 simplified to n^3. Serves me right for doing calculus in my head.

deuseldorf
A: 

In SQL:

create table numbers(n int not null)
insert into numbers(n) values(1),(1), (2), (2), (3), (4)


select distinct num1.n+num2.n sum2n
from numbers num1
inner join numbers num2 
    on num1.n<>num2.n
order by sum2n

C# LINQ:

List<int> num = new List<int>{ 1, 1, 2, 2, 3, 4};
var uNum = num.Distinct().ToList();
var sums=(from num1 in uNum
        from num2 in uNum 
        where num1!=num2
        select num1+num2).Distinct();
foreach (var s in sums)
{
    Console.WriteLine(s);
}
vzczc
+2  A: 

You can do this in two lines in python with

allSums = set(a+b for a in X for b in X)
allSums = sorted(allSums)

The cost of this is n^2 (maybe an extra log factor for the set?) for the iteration and s * log(s) for the sorting where s is the size of the set.

The size of the set could be as big as n*(n-1)/2 for example if X = [1,2,4,...,2^n]. So if you want to generate this list it will take at least n^2/2 in the worst case since this is the size of the output.

However if you want to select the first k elements of the result you can do this in O(kn) using a selection algorithm for sorted X+Y matrices by Frederickson and Johnson (see here for gory details). Although this can probably be modified to generate them online by reusing computation and get an efficient generator for this set.

@deuseldorf, Peter There is some confusion about (n!) I seriously doubt deuseldorf meant "n factorial" but simply "n, (very excited)!"

This has better complexity than all of the other solutions I think! O(n^2.log(n)). It's also the most readable and the shortest.
Paul Hankin
A: 

Pall, vzczc, your solutions 7842, 6458 – while elegant and straightforward – both take first tact that I tried for this problem, and it appeared too slow in practice for the problem I was working on.

In particular, it doesn't exploit any of the specific information you have at your disposal: it creates many duplicate entries by repeatedly summing the same pairs of numbers together, and it doesn't take advantage of the fact that the intermediate list is made up of n sorted lists. You might as well be summing the elements of two unsorted lists.

vzczc, are you missing a final sort in your C# code?

Peter Burns
A: 

This question has been wracking my brain for about a day now. Awesome.

Anyways, you can't get away from the n^2 nature of it easily, but you can do slightly better with the merge since you can bound the range to insert each element in.

If you look at all the lists you generate, they have the following form:

(a[i], a[j]) | j>=i

If you flip it 90 degrees, you get:

(a[i], a[j]) | i<=j

Now, the merge process should be taking two lists i and i+1 (which correspond to lists where the first member is always a[i] and a[i+1]), you can bound the range to insert element (a[i + 1], a[j]) into list i by the location of (a[i], a[j]) and the location of (a[i + 1], a[j + 1]).

This means that you should merge in reverse in terms of j. I don't know (yet) if you can leverage this across j as well, but it seems possible.

MSN

Mat Noguchi
+3  A: 

If you write out the sums like this:

1 4  5  6  8  9
---------------
2 5  6  7  9 10
  8  9 10 12 13
    10 11 13 14
       12 14 15
          16 17
             18

You'll notice that since M[i,j] <= M[i,j+1] and M[i,j] <= M[i+1,j], then you only need to examine the top left "corners" and choose the lowest one.

e.g.

  • only 1 top left corner, pick 2
  • only 1, pick 5
  • 6 or 8, pick 6
  • 7 or 8, pick 7
  • 9 or 8, pick 8
  • 9 or 9, pick both :)
  • 10 or 10 or 10, pick all
  • 12 or 11, pick 11
  • 12 or 12, pick both
  • 13 or 13, pick both
  • 14 or 14, pick both
  • 15 or 16, pick 15
  • only 1, pick 16
  • only 1, pick 17
  • only 1, pick 18

Of course, when you have lots of top left corners then this solution devolves.

I'm pretty sure this problem is Ω(n²), because you have to calculate the sums for each M[i,j] -- unless someone has a better algorithm for the summation :)

Porges
I think this is O(n^3) since there are n potential 'top left corners' at each stage.
Paul Hankin
You can implement this algorithm in O(n^2 log n) time by storing the first unpicked entry in each row in a priority queue, but asymptotically this is no better than just generating all sums and sorting.
Reid Barton
+1  A: 

No matter what you do, without additional constraints on the input values, you cannot do better than O(n^2), simply because you have to iterate through all pairs of numbers. The iteration will dominate sorting (which you can do in O(n log n) or faster).

florin
Yes, but to sort n^2 things takes O(n^2 log n) so sorting never dominates.
Paul Hankin