views:

46

answers:

1

Hi all,

I'm randomly selecting nodes from a Binary Tree and I have to build a black-box test that proves all nodes have the nearly the same probability of being selected.

I'm adapting the Chi-Square test algorithm based on this article http://en.wikibooks.org/wiki/Algorithm_Implementation/Pseudorandom_Numbers/Chi-Square_Test but I'm a bit confused about what 'r' should be.

Just as a side question, do you think this is an appropriate algorithm to prove randomness in a set of results?

Thank you, Diogo

+2  A: 

The answer is right there in the citation. Scroll down to the javadocs:

* @param  r           upper bound for the random range

It sounds to me like r ought to be the number of nodes in the tree. Do you agree?

I'm randomly selecting nodes from a Binary Tree and I have to build a black-box test that proves all nodes have the nearly the same probability of being selected.

I don't understand this. Do you? It seems less a test of the tree data structure and more about the random algorithm you use to choose a node to be selected. What does this outcome prove?

Walk me through what you're doing, please. Here's what I'm imagining:

  1. You have a list of nodes in your tree.
  2. You use a random algorithm somehow to choose a node to select from your tree.
  3. You iterate through your tree to find that node.

If there are r nodes in your tree, each one should have 1/r probability of being selected. It's a multi-dimensional, r-sided coin or die. Right?

The tree could bring another element into the mix: the probabilities are changed if the chances of being selected depend on where you are in the tree and whether or not you're allowed to backtrack. If that's the case, the chances of being selected are different at each node. Starting at the root means you can get to all child nodes. Being on the first level and not being able to backtrack eliminates the root and all other first level nodes from consideration, and so on.

Which problem are you trying to solve?

duffymo
Basically I only have access to the values returned but I'm creating a balanced test tree where each node value is a simple counter :) so, I guess you're right. Could we assume it's the average of the possible returned value range? (sorry if this sounds confusing...) Many thanks
DiogoNeves
This is failing all the time but I assume the C rand() isn't good enough...
DiogoNeves
What does "failing" look like?
duffymo
C rand() is good enough for the modest sized trees that you're probably using.
duffymo
Ok, the problem is proving that every single node has nearly the same probability of being selected using a black-box test that only has access to the interface that returns the randomly selected node (from a tree I created)
DiogoNeves
Another idea I had was just calculating the frequency of each node in a 10000 run and comparing with the expected probability...
DiogoNeves
Yes, that sounds good. I'll ask again - what are you testing about the tree? You're only showing how "random" the random selector is.
duffymo
yes, exactly, just a unity test that runs over the method that selects nodes and succeeds if all nodes have the same probability of being selected
DiogoNeves
Just a quick note, I was look at the C# version at the article I shared and there's something wrong with the last condition... It's working now hehehe Thanks everyone
DiogoNeves