views:

148

answers:

3

What is an efficient way to generate a random contingency table? A contingency table is defined as a rectangular matrix such that the sum of each row is fixed, and the sum of each column is fixed, but the individual elements may be anything as long as the sum of each row and column is correct.

Note that it's very easy to generate random contingency tables, but I'm looking for something more efficient than the naive algorithm.

A: 

This sounds like a constraint satisfaction problem (CSP) to me.

You would basically start at some point and choose a cell's value randomly from the set of allowed values. Then you update the sets of eligible values for all cells in the same row/column and choose the next cell (according to the CSP heuristic you are using) to (randomly) assign a value to, again from its set of eligible values. Again, you also have to update the sets of eligible values for all cells in the same row/column. In case you encounter a cell that has an empty set of eligible values, you have to do backtracking.

However, the notion of 'set of eligible values' might be hard to represent in a data structure, depending on the range of values you are allowing.

__roland__
A: 
ephemient
+4  A: 

Looking at the code of the networksis package for R might be helpful. I believe that efficient computation requires fancy Markov Chain sequential importance resampling techniques, so you might want to avoid reimplementing this if you can avoid it.

Edit: The relevant paper is Chen, Diaconis, Holmes, and Liu (2005). In the words of the authors, "[o]ur method compares favorably with other existing Monte Carlo- based algorithms, and sometimes is a few orders of magnitude more efficient."

othercriteria
Thanks. This is exactly the answer I didn't want to hear--something amazingly complicated. I needed to know this because I'm working on a small open-source statistics lib for the D programming language. Knowing this, I guess I just won't include this feature.
dsimcha