views:

126

answers:

3

Suppose I have some data stored in a container of unique_ptrs:

struct MyData {
    int id;  // a unique id for this particular instance
    data some_data; // arbitrary additional data
};

// ...

std::vector<std::unique_ptr<MyData>> my_data_vec;

The ordering of my_data_vec is important. Suppose now I have another vector of IDs of MyDatas:

std::vector<int> my_data_ids;

I now want to rearrange my_data_vec such that the elements are in the sequence specified by my_data_ids. (Don't forget moving a unique_ptr requires move-semantics with std::move().)

What's the most algorithmically efficient way to achieve this, and do any of the STL algorithms lend themselves well to achieving this? I can't see that std::sort would be any help.

Edit: I can use O(n) memory space (not too worried about memory), but the IDs are arbitrary (in my specific case they are actually randomly generated).

A: 

Why don't you try moving the data into a STL Set ? you need only to implement the comparison function, and you will end up with a perfectly ordered set of data very fast.

John Paul
Good idea - but the ordering of my_data_vec is significant and arbitrary. The ordering information is lost by a set.
AshleysBrain
"Why don't you try moving the data into a STL Set?" Because then they wouldn't be in th order specified by `my_data_ids`.
sbi
A: 

Why don't you just use a map<int, unique_ptr<MyData>> (or multimap)?

rlbond
+4  A: 
  1. Create a map that maps ids to their index in my_data_ids.
  2. Create a function object that compares std::unique_ptr<MyData> based on their ID's index in that map.
  3. Use std::sort to sort the my_data_vec using that function object.

Here's a sketch of this:

// Beware, brain-compiled code ahead!
typedef std::vector<int> my_data_ids_type;
typedef std::map<int,my_data_ids_type::size_type> my_data_ids_map_type;

class my_id_comparator : public std::binary_function< bool
                                                    , std::unique_ptr<MyData>
                                                    , std::unique_ptr<MyData> > {
public:
  my_id_comparator(const my_data_ids_map_type& my_data_ids_map)
    : my_data_ids_map_(my_data_ids_map) {}

  bool operator()( const std::unique_ptr<MyData>& lhs
                 , const std::unique_ptr<MyData>& rhs ) const
  {
     my_data_ids_map_type::const_iterator it_lhs = my_data_ids_map_.find(lhs.id);
     my_data_ids_map_type::const_iterator it_rhs = my_data_ids_map_.find(rhs.id);
     if( it_lhs == my_data_ids_map_.end() || it_rhs == my_data_ids_map_.end() )
       throw "dammit!"; // whatever
     return it_lhs->second < it_rhs->second;
  }
private
  my_data_ids_map_type& my_data_ids_map_;
};

//...

my_data_ids_map_type my_data_ids_map;
// ...
// populate my_data_ids_map with the IDs and their indexes from my_data_ids
// ...
std::sort( my_data_vec.begin(), my_data_vec.end(), my_id_comparator(my_data_ids_map) );

If memory is scarce, but time doesn't matter, you could do away with the map and search the IDs in the my_data_ids vector for each comparison. However, you would have to be really desperate for memory to do that, since two linearly complex operations per comparison are going to be quite expensive.

sbi
This is a very nice solution, thanks! Didn't think `sort` would be applicable but you proved me wrong :) (btw I'm using C++0x so I used a lambda in the sort - keeps things a bit tidier)
AshleysBrain
@James: I'm not sure throwing a `std::exception` is right at this place. The way the problem was presented, it seems like there __must__ be an entry in `my_data_ids` for each ID. If so, then that should probably be an assertion instead, since failure would indicate that a pre-condition doesn't hold. Since I know too little about the problem, I did something obviously wrong and accompanied it by that comment, hoping it would be obvious that this needs more consideration. But maybe you're right. _"Do as I say, not as I do"_ usually doesn't very work well. What do you suggest?
sbi
@sbi: I just meant that as a joke :-) (Then I decided it wasn't funny so I deleted it) As for my opinion, in this scenario, I tend to throw a `logic_error` or something derived therefrom.
James McNellis
@James: No offense taken, but I am always wary of posting code that's actually wrong or does things one should not do. As I said _"Do as I say, not as I do"_ doesn't work, I know that from teaching.
sbi
I think that the time complexity of this solution could be higher than you might like. Each comparision is O(log n) rather than O(1), so the total time is O(n*(log n)^2) instead of the usual O(n*log(n)). This might not matter, but if it does then using a std::unordered_map rather than a std::map would be better.
Richard Wolf
@Richard: That's a good point. I'd be very interested in a better solution should someone find one.
sbi
How about 1) Create a map of id to pointer; 2) clear the original list; 3) for each id in id list, push back the pointer by looking up its id in the map. Isn't that O(n*log(n))?
AshleysBrain
@sbi: My suggestion was to use std::unordered_map which gives constant time access to elements in the best case. Just make sure that the hash function in use is appropriate.
Richard Wolf
@Richard: I understood that you suggested using a hash map. `:)` I was thinking of reducing memory usage and the first n in O(n*log n^2). Can't we get away without creating that map? For example, assuming we could change `my_data_ids` into a heap, we would at least reduce the memory footprint. And better data locality in a `std::vector` (compared with a `std::map`) practically might improve the algorithm despite all the theoretical O-ness. I was fishing for improvements like this.
sbi