I have a table of items with [ID, ATTR1, ATTR2, ATTR3]
. I'd like to select about half of the items, but try to get a random result set that is NOT clustered. In other words, there's a fairly even spread of ATTR1 values, ATTR2 values, and ATTR3 values. This does NOT necessarily represent the data as a whole, in other words, the total table may be generally concentrated on certain attribute values, but I'd like to select a subset with more variety. The attributes are not inter-related, so there's not really a correlation between ATTR1 and ATTR2.
As an example, imagine ATTR1 = "State". I'd like each line item in my subset to be from a different state, even if in the whole set, most of my data is concentrated on a few states. And for this to simultaneously be true of the other 2 attributes, too. (I realize that some tables might not make this possible, but there's enough data that it's unlikely to have no solution)
Any ideas for an efficient algorithm? Thanks! I don't really even know how to search for this :)
(by the way, it's OK if this requires pre-calculation or -indexing on the whole set, so long as I can draw out random varied subsets quickly)