I have a program that I'm porting from one language to another. I'm doing this with a translation program that I'm developing myself. The relevant result of this is that I expect that there are a number of bugs in my system that I am going to need to find and fix. Each bug is likely to manifest in many places and fixing it will fix the bug in all the places it shows up. (I feel like a have a really big lever and I'm pushing on the short end, I push really hard but when things move they move a lot.)
I have the ability to run execution log diffs so I'm measuring my progress by how far through the test suite I can run it before it diverges from the original program's execution. (Thank [whatever you want] for BeyondCompare, it works reasonably well with ~1M line files :D)
The question is: What shape should I expect to see if i were to plot that run length as a function of time? (more time == more bugs removed)
My first thought is something like a Poisson distribution. However because fixing each bug also removes all other occurrences of it, that shouldn't be quite correct.
(BTW this might have real world implications with regards to estimating when programs will finish being debugged.)
Edit: A more abstract statement of the problem:
Given an ordered list of N integers selected from the range [0,M] (where N>>M) with a uniform distribution along the positions in the list, but not necessarily with a uniform distribution of numbers. What is the expected location of that last “new” number? What about the second to last? Etc?