tags:

views:

216

answers:

8

Goal here is to merge multiple arrays which are already sorted into a resultant array.

I've written the following solution and wondering if there is a way to improve the solution

/*
    Goal is to merge all sorted arrays
*/
void mergeAll(const vector< vector<int> >& listOfIntegers,  vector<int>& result)
{

    int totalNumbers = listOfIntegers.size();
    vector<int> curpos;
    int currow = 0 , minElement , foundMinAt = 0;
    curpos.reserve(totalNumbers);

    // Set the current position that was travered to 0 in all the array elements
    for ( int i = 0; i < totalNumbers; ++i)
    {
        curpos.push_back(0);
    }

    for ( ; ; )
    {
        /*  Find the first minimum 
            Which is basically the first element in the array that hasn't been fully traversed
        */

        for ( currow = 0 ; currow < totalNumbers ; ++currow)
        {
            if ( curpos[currow] < listOfIntegers[currow].size() )
            {
                minElement = listOfIntegers[currow][curpos[currow] ];
                foundMinAt = currow;
                break;
            }
        }
        /* If all the elements were traversed in all the arrays, then no further work needs to be done */
        if ( !(currow < totalNumbers ) )
            break;
        /* 
            Traverse each of the array and find out the first available minimum value
        */
        for ( ;currow < totalNumbers; ++currow)
        {
            if ( listOfIntegers[currow][curpos[currow] ] < minElement )
            {
                minElement = listOfIntegers[currow][curpos[currow] ];
                foundMinAt = currow;
            }
        }
        /* 
            Store the minimum into the resultant array 
            and increment the element traversed
        */
        result.push_back(minElement);
        ++curpos[foundMinAt];
    }
}

The corresponding main goes like this.

int main()
{
    vector< vector<int> > myInt;
    vector<int> result;

    myInt.push_back(vector<int>() );
    myInt.push_back(vector<int>() );
    myInt.push_back(vector<int>() );

    myInt[0].push_back(10);
    myInt[0].push_back(12);
    myInt[0].push_back(15);


    myInt[1].push_back(20);
    myInt[1].push_back(21);
    myInt[1].push_back(22);

    myInt[2].push_back(14);
    myInt[2].push_back(17);
    myInt[2].push_back(30);

    mergeAll(myInt,result);

    for ( int i = 0; i < result.size() ; ++i)
    {
        cout << result[i] << endl;
    }
}
A: 

All you need is two pointers (or just int index counters), checking for minimum between array A and B, copying the value over to the resultant list, and incrementing the pointer of the array the minimum came from. If you run out of elements on one source array, copy the remainder of the second to the resultant and you're done.

Edit: You can trivially expand this to N arrays.

Edit: Don't trivially expand this to N arrays :-). Do two at a time. Silly me.

Gary
He has more than two arrays.
IVlad
Like Mad said, he's merging an arbitrary number of arrays.
Odrade
The method is trivially expanded to any number of arrays. "All you need is n pointers, checking for minimum between all n arrays, ..."
Mark Ransom
That makes the complexity O(NM²), where N is the length of the longest array and M is the number of arrays. Goz's solution runs in only O(NM log M).
Nabb
Nabb, you're the man, but can you elaborate how you saw that?
Gary
A: 

You could just stick them all into a multiset. That will handle the sorting for you.

frankc
This is a very bad idea. First of all multisets are slow, so you might as well copy all the lists in another vector and std::sort that vector, which will be much faster. Second, this is only better than merging if `log totalElements < numberOfLists`.
IVlad
Who cares if its "slow"? Why would you assume it needs to be blazing fast? Nothing in the question said anything about this being for high frequency trading, or the input vectors being many billions of elements. That is blatant premature optimization.
frankc
@user275455 I believe the point is that simply concatenating all the vectors into a single vector and then sorting that would be just as simple and faster than the multiset.
Mark B
And that would be just as good? So what?
frankc
@userN: The user asks for an efficient implementation, we try to give him one. If he wanted a naive implementation he wouldn't have to ask us.
Carl Smotricz
A: 

If you are merging very many vector together, then you could speed up performance by using a sort of tree to determine which vector contains the smallest element. This is probably not necessary for your application, but comment if it is and I'll try to work it out.

Thom Smith
Actually, a heap is what you're looking for.
Nick Johnson
Explained much better in kunigami's answer.
Carl Smotricz
A: 

Perhaps I'm misunderstanding the question...and I feel like I'm misunderstanding your solution.

That said, maybe this answer is totally off-base and not helpful.

But, especially with the number of vectors and push_back's you're already using, why do you not just use std::sort?

#include <algorithm>
void mergeAll(const vector<vector<int>> &origList, vector<int> &resultList)
{
    for(int i = 0; i < origList.size(); ++i)
    {
        resultList.insert(resultList.end(), origList[i].begin(), origList[i].end());
    }
    std::sort(resultList.begin(), resultList.end());
}

I apologize if this is totally off from what you're looking for. But it's how I understood the problem and the solution.

std::sort runs in O(N log (N)) http://www.cppreference.com/wiki/stl/algorithm/sort

KevenK
I'm not keen on sorting. Merging is what I'm keen on. Wondering, if there is a faster way to merge the elements.
Merge Sort runs an average `O(N log (N))` as well. http://en.wikipedia.org/wiki/Merge_sort#AnalysisI don't see either solution being, performance wise, much different. However, complexity, stability and maintenance may be affected. I would simplify this step by using the `std` functions and continue looking elsewhere for my performance gains.
KevenK
A: 

If you want to take advantage of multi-threading then a fairly good solution would be to just merge 2 lists at a time.

ie suppose you have 9 lists.

merge list 0 with 1. merge list 2 with 3. merge list 4 with 5. merge list 6 with 7.

These can be performed concurrently.

Then:

merge list 0&1 with 2&3 merge list 4&5 with 6&7

Again these can be performed concurrently.

then merge list 0,1,2&3 with list 4,5,6&7

finally merge list 0,1,2,3,4,5,6&7 with list 8.

Job done.

I'm not sure on the complexity of that but it seems the obvious solution and DOES have the bonus of being multi-threadable to some extent.

Goz
+2  A: 

You can generalize Merge Sort algorithm and work with multiple pointers. Initially, all of them are pointing to the beginning of each array. You maintain these pointers sorted (by the values they point to) in a priority queue. In each step, you remove the smallest element in the heap in O(log n) (n is the number of arrays). You then output the element pointed by the extracted pointer. Now you increment this pointer in one position and if you didn't reach the end of the array, reinsert in the priority queue in O(log n). Proceed this way until the heap is not empty. If there are a total of m elements, the complexity is O(m log n). The elements are output in sorted order this way.

kunigami
Sounds very good to me! I just verified that the performance of priority queues backed by heaps is really just log n.
Carl Smotricz
Good one. Thanks
A: 

Consider the priority-queue implementation in this answer linked in a comment above: http://stackoverflow.com/questions/886178/merging-8-sorted-lists-in-c-which-algorithm-should-i-use/886203#886203

It's O(n lg m) time (where n = total number of items and m = number of lists).

Mark B
A: 

Its not entirely related to your problem, but I would that instead of loading the vector size into a seperate variables

int totalNumbers = listOfIntegers.size();

You just access it directly through the .size() function - from my understanding of the still it just returns a member anyway.

Tomas Cokis