views:

196

answers:

3

Greetings,

Any input on a way to divide a std::vector into two equal parts ? I need to find the smallest possible difference between |part1 - part2|.

This is how I'm doing it now, but from what you can probably tell it will yield a non-optimal split in some cases.

auto mid = std::find_if(prim, ultim, [&](double temp) -> bool
{
    if(tempsum >= sum)
        return true;

    tempsum += temp;
    sum -= temp;
    return false;
});

The vector is sorted, highest to lowest, values can indeed appear twice. I'm not expecting part1 and part2 to have the same numbers of elements, but sum(part1) should be as close as possible to sum(part2)

For example if we would have { 2.4, 0.12, 1.26, 0.51, 0.70 }, the best split would be { 2.4, 0.12 } and { 1.26, 0.51, 0.70 }.

If it helps, I'm trying to achieve the splitting algorithm for the Shannon Fano encoding.

Maybe this will help you guys understand my question better http://en.wikipedia.org/wiki/Shannon%E2%80%93Fano_coding#Example

Any input is appreciated, thanks!

+2  A: 
andand
This seems incorrect if one looks at the wikipedia entry: in the wikipedia entry the list is sorted (decreasing) and the partitioning involves finding a split point within this list, putting the highest elements in one set and the lowest in another... though the wikipedia example seems suboptimal in this regard.
Matthieu M.
This is the greedy algorithm that they mention and briefly describe.
andand
@Matthieu: The split isn't a pivot split but rather a partitioning of the data; elements 1 and 3 can go to sublist 1 while elements 2 and 4 can go to sublist 2
Jamie Cook
+3  A: 

This is a Partition problem which is known to be NP-Complete, so no polynomial-time algorithm exists in general. However, the problem becomes easier when the sizes of the elements in the set are bounded. Above link to Wikipedia has quite a nice section on approximation algorithms (when you need a "good-enough" solution).

Krystian
A: 

If you wanted to use std algorithms and lambdas you could do the following

void splitProbabilityVector(std::vector<double>& data, std::vector<double>& rightHandSplit)
{
    double s1=0.0, s2=0.0;
    auto bound = std::stable_partition(data.begin(), data.end(), [&](double e) -> bool
    {
        if (abs(e + s1 - s2) < abs(e + s2 - s1))
        { s1 += e; return true;}
        else
        { s2 += e; return false; }
    });

    rightHandSplit.assign(bound, data.end());
    data.resize(bound-data.begin());
}

which should be quite performant. Just out of curiosity, why are you using this algorithm when on the wiki page you linked it states:

For this reason, Shannon–Fano is almost never used; Huffman coding is almost as computationally simple and produces prefix codes that always achieve the lowest expected code word length.

Jamie Cook
Well the assignment was to analyze a text and compare the results yielded by both methods, to compute the medium code length for each one
Cosmin
Ah! comparison... great stuff - I loved your use of c++0x in your questions example code btw.
Jamie Cook