If you don't need B to be in random order, then random_shuffle does more work than you need.
If by "STL" you mean SGI's STL, then use random_sample.
If by "STL" you mean the C++ standard libraries, then you don't have random_sample. You might want to copy the implementation, except stop after the first n
steps. This will reduce the time.
Note that these both modify a sequence in place. Depending where you actually want A and B to end up, and who owns the original, this might mean that you end up doing 2 copies of each row - once to get it into a mutable container for the shuffle, then again to get it into its final destination. This is more memory and processing time than is required. To fix this you could maybe swap
rows out of the temporary container, and into A and B. Or copy the algorithm, but adapt it to:
- Make a list of the indexes of the first vector
- Partially shuffle the list of indexes
- Copy the rows corresponding to the first n indexes to A, and the rest to B.
I'm not certain this is faster or uses less memory, but I suspect so.
The standard for random_shuffle
says that it performs "swaps". I hope that means it's efficient for vectors, but you might want to check that it is actually using an optimised swap
, not doing any copying. I think it should mean that, especially since the natural implementation is as Fisher-Yates, but I'm not sure whether the language in the standard should be taken to guarantee it. If it is copying, then your second approach is going to be very slow. If it's using swap
then they're roughly comparable. swap
on a vector is going to be slightly slower than swap
on an index, but there's not a whole lot in it. Swapping either a vector or an index is very quick compared with copying a row, and there are M of each operation, so I doubt it will make a huge difference to total run time.
[Edit: Alex Martelli was complaining recently about misuse of the term "STL" to mean the C++ standard libraries. In this case it does make a difference :-)]