views:

295

answers:

7

I'm planning to write an interactive C++ geometry processing plug-in that will frequently sort large amounts of data. Although preliminary indications are that the sort will take only a second or two, I'd prefer to show progress during that time - i.e I'd like to update a progress indicator a few times per second. This would be preferable to turning on a wait cursor and leaving the user with a program the freezes for an indeterminate length of time (even if that's just a few seconds).

If I were to use something like std::sort, I could use the comparison function to update the progress indicator every now and then, but I'd have no idea of 'percentage complete'. I could also break the sort down into sub-sorts, updating progress between sub-sorts, and then merging. My best bet may be to write own sort method, although I don't know how much effort it would require to get performance as good as std::sort (and ensure correctness). In any case, that sort method would occasionally send a "percentage complete" to a callback method.

I was wondering whether other people have encountered and solved this problem - I'm hoping that maybe there's a sort method in a standard library that does what I want, or some other technique that I haven't thought of.

Update: Thanks for the great answers so far. There have been a few very good suggestions, and I'm going to hold off on choosing the accepted answer until I've had a chance to test out the ideas in my upcoming project.

Update 2: I completed my project, and this turned out to be a non-issue (at least for the customer. Since they will be selling the software, they may still get feedback from their customers that will change their minds about this). Choosing an accepted answer was difficult because there were many good answers, but in the end the one I chose pointed to a wiki article on Merge Sort that had a very evocative animation. So that's the first strategy I would have pursued had I needed to go ahead with this).

+2  A: 

I would recommend your second option: use std::sort or another standard sorting function like qsort, and have the comparator report its progress. But don't update in every comparison--that would be unbearably slow--instead update every (say) 100ms.

JSBangs
This doesn't answer the OPs big question though. How do you actually figure out how far done the sort is using this method?
Omnifarious
I think if you give the comparator the size of the array in its constructor and then used Omifarious's approximation above (that there will be about (n lg n) comparisons). The comparator could then keep track of how many times it was called.I'm not sure and have not completely thought through it, but I think merge sort might be amenable to keeping good track of the progress. But of course merge sort is not introsort. Still merge sort is (n lg n) and might be acceptable.
Craig W. Wright
@Craig W. Wright: That would be difficult because STL comparison functors are not allowed to have state.
Billy ONeal
@Billy: How so? The comparison function is required to produce consistent (time-invariant) results, but AFAICT there's no ban on side effects that don't change the return value.
Ben Voigt
@Ben: It's not banned, per se, but algorithms are allowed to pass the functor around *by value*, which will leave you with a bunch of copies, each with separate counts. You'd have to store the information outside of the functor for consistent results (you can have the functor store a pointer, I guess...)
Billy ONeal
A: 

Use the observer pattern to signal back to the parent when each part completes. Using that and the total number of elements that need sorting you can update your progressbar in real time.

Konrad
+9  A: 

I think, even if you wrote your own sort, that you would have to do a lot of careful measurement if you wanted the progress indicator to be accurate. If you only want an approximate progress indicator, then you can use some metric like 'average distance between compared elements' or 'number of comparisons as compared to average expected number for quicksort' as your metric and implement the comparison idea you already mentioned.

And yes, I assume that you aren't a complete idiot and do not plan on updating the progress indicator at each and every comparison. If you did that you'd spend much more time indicating progress than sorting.

As an example, you would generally expect about n log2 n operations for quicksort. The analysis of how many comparisons are involved is more detailed and can be more accurate than that general measure, but for the purposes of this example, lets just assume. So you could count comparisons and report number_of_comparisons / (n log2 n) as your estimate of progress.

Since that's just an average indicator I would run a few experiments and see how far your estimate is off, and throw in some fudge factors to make it line up with the average expected case. You could also have a progress bar that indicated the uncertainty by having sort of "This is where I think I'll be done." indicator and some space after the indicator.

Even if you used your own sort and came up with a more seemingly precise measure, the progress bar still wouldn't update smoothly and the effect would be similar. The only way you know for sure how long your sort is going to take is if you use a somewhat slower, but really predictable sort, in which case you can predict how long it will take from the number of elements, or use a really fast sort that has less predictable behavior in specific cases, in which case there is no real way to have a perfectly accurate progress bar.

Predictability of subtasks and predictability of total number of comparisons are strongly linked. So I really don't think that subtasks make a better measure than total number of comparisons.

If you want to use your own sort and predictability is your highest goal, go for heapsort. It's still an O(n log2 n) sort, and it's close to being a minimum comparison sort (or so I remember from reading Knuth). It also takes a very predictable amount of time to complete regardless of the dataset its fed. It's one of the slower O(n log2 n) sorts, but still.

As one of your commenters mentioned though, you might be solving a problem that doesn't actually exist. Run some experiments first. The problem is a fun intellectual challenge regardless of its usefulness though. :-)

Omnifarious
+1 for actually thinking ahead about how to measure progress. If I were going to write my own, I'd still have to figure this one out. I guess the real question is how much of an advantage I have knowing the internal state of the algorithm, as opposed to just the number of comparisons so far. And thanks for assuming I'm not a complete idiot about updating the progress indicator every comparison, although you can safely assume I'm a complete idiot about sorting.
brainjam
@brainjam: I'm no algorithms expert, but from what I know, knowing the internal state doesn't buy you as much useful data as you might think. Quicksort, for example, might take a very tiny amount of time for one side and a very long amount of time for the other after the list is split in half. And if you pick a predictable sort, you can predict the number of comparisons behavior just as easily as how long various sub-tasks take to complete.
Omnifarious
Accuracy in the progress indicator is not so important as keeping the user entertained while time passes, setting their expectations, and allowing them to cancel. So I think I'd just double the estimate to `2*n*log2(n)`, and if the sort finishes faster than expectation, so much the better.
brainjam
@brainjam: How about this suggestion -- Log the actual and estimated number of comparisons at the conclusion of each sort. That way you keep some statistics over time as you're running the program. Eventually you can take the logging out, but your statistics should help you refine the accuracy a bit.
Omnifarious
+1  A: 

I see your problem as the following:

  1. You want discrete events to be fired during a single continuous process.
  2. This sub-division is just to tell the user things are in progress.

My suggestions are:

  1. Use a loading icon from something like http://ajaxload.info/ , or if it not a gui based environment, just spell out loading. Since the event is under 2 seconds, this will not be a issue. Hang ups are expected if the wait time exceeds 10 seconds.

  2. Writing your own sort method does bring in a lot of issues of thread safety, which might cause problems if your code is using multi-threading or is bound to do so in the future.

3.Another important information that you should consider how badly out of order will the data be everytime you want to sort, so in effect you will be measure the degree of randomness present, and the expected number of computations that you might need to do. you can use this information as an indicator as to how many swaps are required, which in turn can be counted on as you iterate thru the sort. Play around with the data.

Egon
+4  A: 

Since std::sort is template based, the source should be available in a header. You can make a copy of it and insert your progress callback. The big problem will be predicting how close you are to completion - most sort functions will be based on Quicksort, which doesn't always do the same number of comparisons.

Writing your own Merge sort would be a possibility; the algorithm is easy and the number of steps are well defined.

Mark Ransom
Both good suggestions. It hadn't occurred to me the std::sort was template based. For future reference, there's a C++ implementation of Merge sort at rosettacode.org: http://rosettacode.org/wiki/Merge_sort#C.2B.2B
brainjam
+1  A: 

use brute force :)

int elem_num = raw_data.size();
int percentage_delta = 100/(elem_num/20);
int percentage = 0;
int i = 0;
std::multiset<Elem*> sorted_data(&compareElemFunc);
foreach(Elem& elem, raw_data)
{
    sorted_data.insert(&elem);
    if(i%20)
    {
        updateProgressBar(percentage);
        percentage += percentage_delta;
    }
    i++;
}
//now, your data is perfectly sorted, iterate through sorted_data

(in case you don't want to implement your own std::sort() and since I'm lacking complete requirements)

Alsk
I suppose this is O(n logn), but I wonder how it compares to doing a std::sort. If std::sort takes 1 second, and this solution takes 10 seconds, I would think twice about using it. The nice thing about this solution is that you can cancel the process anytime. Btw, I would change the progress update factor from 20 to 1000 or even 10000 -- a few updates per second is sufficient.
brainjam
A: 

I don't recommend trying to hack apart std::sort. That's generally implemented with introsort and is an extremely fast NLogN operation. Constructing the container you're going to sort will typically be more expensive than sorting the data.

However, if you're going to implement a progress bar, I recommend you put the sorting in a separate thread. Normally multithreaded applications are harder to write and maintain than single-threaded ones, but you can do it in a way in which it isn't for this progress bar case. Your application can still be predominantly single-threaded without any concurrent operations being performed with the exception of this progress bar and probably some event handling to keep the UI responsive. When you're ready to sort the data, simply fire off another thread to do it and put the main thread in a wait loop until the sorting thread is finished, sleeping here and there and upgrating the progress bar in the meantime.

You can generalize this non-intrusive approach to any kind of time-consuming operation without having to sprinkle update_progress_bar() type calls throughout your code or digging into the implementations of std::sort or trying to reinvent wheels. Because the main thread will be in a waiting/updating progress bar state and therefore blocking in a sense until your working thread is finished, you don't have any of the issues associated with multithreading (need for thread synchronization to access shared resources throughout your application with the exception of the progress counter, race conditions, dead locks, etc). It'll also be the smoothest progress counter you can implement since it will be updating concurrently.

If you're worried about the efficiency with associated with locking the progress counter, simply use atomic operations to increment it.

As for determining how much the sort algorithm has progressed, there are a couple ways to do it. One is to let it run once with the size of the data you have and try to predict the amount of time it would take for subsequent runs. That's completely non-intrusive but a little difficult to do, yet, if done right, it will monitor progress more accurately than incrementing a counter at regular intervals (which omits the fact that intervals may not take even amounts of time). The second approach which is simpler but a little bit evil is to modify your comparator predicate to increment a progress counter. Making predicates with state is generally frowned upon, but it's less evil than trying to implement your own introsort just because you want a progress counter.

Also if your introsort is taking so long, I have to wonder, does your container store these triangle objects or pointers to them? If the former, you might want to consider the latter as it should speed things up dramatically.