views:

78

answers:

2

Sitation:

overview:

I have something like this:

std::vector<SomeType> values;
std::vector<int> indexes;

struct Range{
    int firstElement;//first element to be used in indexes array
    int numElements;//number of element to be used from indexed array
    int minIndex;/*minimum index encountered between firstElement 
        and firstElements+numElements*/
    int maxIndex;/*maximum index encountered between firstElement 
        and firstElements+numElements*/
    Range()
        :firstElement(0), numElements(0), minIndex(0), maxIndex(0){
    }
}

std::vector<Range> ranges;

I need to sort values, remap indexes, and recalculate ranges to minimize maxValueIndex-minValueIndex for each range.

details:

values is an array(okay, "vector") of some type (irrelevant which one). elements in values may be unique, but this is not guaranteed.

indexes is an vector of ints. each element in "indexes" is an indexes that correspond to some element in values. Elements in indexes are not unique, one value may repeat multiple types. And indexes.size() >= values.size().

Now, ranges correspond to a "chunk" of data from indexes. firstElement is an index of element to be used from indexes (i.e. used like this: indexes[range.firstElement]), numElements is (obviously) number of elements to be used, minIndex is mininum in (indexes[firstElement]...indexes[firstElement+numElements-1]) a,d maxIndex is maximum in (indexes[firstElement]...indexes[firstElement+numElements-1]). Ranges never overlap. I.e. for every two ranges a, b

((a.firstElement >= b.firstElement) && (a.firstElement < (b.firstElement+b.numElements)) == false

Obviously, when I do any operation on values (swap to elements, etc), I need to update indexes (so they keep pointing on the same value), and recalculate corresponding range, so range's minIndex and maxIndex are correct.

Now, I need to rearrange values in the way that will minimize Range.maxIndex - Range.minIndex. I do not need the "best" result after packing, having "probably the best" or "good" packing will be enough.

problem:
Remapping indexes and recalculating ranges is easy. The problem is that I'm not sure how to sort elements in values, because same index may be encountered in multiple ranges.

Any ideas about how to proceed?

restrictions:

Changing container type is not allowed. Containers should be array-like. No maps, not lists. But you're free to use whatever container you want during the sorting. Also, no boost or external libraries - pure C++/STL, I really neeed only an algorithm.

additional info:

There is no greater/less comparison defined for SomeType - only equality/non-equality. But there should be no need to ever compare two values, only indexes.

The goal of algorithm is to make sure that output of

for (int i = 0; i < indexes.size; i++){ 
    print(values[indexes[i]]); //hypothetical print function
}

Will be identical before and after sorting, while also making sure that for each range Range.maxIndex-Range.minIndex (after sorting) is as small as possible to achieve with reasonable effort. I'm not looking for a "perfect" or "most optimal" solution, having a "probably perfect" or "probably most optimal" solution should be enough.

P.S. This is NOT a homework.

+1  A: 

This is not an algorithm, just some thinking aloud. It will probably break if there are too many duplicates.

If there was no duplicates, you'd simply rearrange the values so the indexes are 0,1,2, and so on. So for the starting point, let's exclude the values that are double-referenced and arrange the rest

Since there are duplicates, you need to figure out where to stick them. Suppose the duplicate is referred to by ranges r1, r2, r3. Now, as long as you insert the duplicate between min([r1,r2,r3].minIndex)-1 and max([r1,r2,r3].maxIndex)+1, the sum of maxIndex-minIndex will be the same no matter where you insert it. Moving the insertion point to the left will reduce max-min for all ranges to the left, but increment it for all ranges to the right. So, I think the sensible thing to do is to insert the duplicate at the left edge (minindex) of the rightmost range (one with largest minIndex) of r1,r2,r3. Repeat with all duplicates.

Arkadiy
A: 

Okay, it looks like there is only one way to reliably solve this problem:

Make sure that no index is ever used by two ranges at once by duplicating values. I.e scan entire array of indexes, and when you find index (of value) that is being used in more than one range, you add copy of that value for each range - each with unique index. After that problem becomes trivial - you simply sort values in the way that will make sure that values array first contains values used only by first range, then values for 2nd range, and so on. I.e. this will get maximum packing.

Because in my app it is more important to minimize sum(ranges[i].maxIndex-ranges[i].minIndex) that to minimize number of values, this approach works for me.

I do not think that there is other reliable way to solve the problem - it is quite easy to get situation when there are indexes used by every range, and in this case it will not be possible to "pack" data no matter what you do. Even allowing index to be used by two ranges at once will lead to problems - you can get ranges a, b and c where a and b, b and c, a and c will have common indexes. In this case it also won't be possible to pack the data.

SigTerm