ansaurus

Question

Random enumeration of a hash table in OCaml

Answer 1

A:

I doubt that there is such function given the interface exposed by Hashtbl. Obvious approach like getting all the values into an array and doing lookups by Array.get a (Random.int (Array.length a)) looks fine to me.

ygrek 2010-10-29 14:42:04

@ygrek Thanks for the reply. That solution has the problem of possibly repeating the element you extract with Array.get. If I've extracted one element and it didn't work, I don't want to extract it again (and this may happen if the Random.int happens to repeat). But yes, I agree that this can be done using without a specific Hashtbl function.

Surikator 2010-10-29 14:48:20

@Surikator - instead of random choosing an element, you could shuffle the array (using the Fisher-Yates algorithm) and then go through the elements in order.

Niki Yoshiuchi 2010-10-29 15:39:12

@Niki That's a good suggestion. I've edited the question to include code for that idea. Still something to be done regarding efficiency, though.

Surikator 2010-10-29 16:27:39

Answer 2

+3 A:

I have two suggestions. The first is to change your rand_enum function so it returns an Enum.t:

let rand_enum ht n =
BatRandom.init n;
let hte = BatHashtbl.enum ht
in Array.enum (BatRandom.shuffle hte)

which isn't terribly different (it's still computing a random enum for all 20k) but is closer to what you originally wanted.

Alternatively, you could always take the source code of HashTbl and recompile it with a rand_enum function. However this also probably won't be that different, as a HashTbl is implemented as an array and if you want to avoid bad duplicates you're probably going to end up using a shuffle.

Niki Yoshiuchi 2010-10-29 17:32:50

Yes, Array.enum makes more sense. Thanks!

Surikator 2010-10-29 17:39:33

You can extend the module; here is a map that I extended with some other properties (to get random elements from a map actually). You'd use it just the same as the Map module. http://nicholas.lucaroni.com/repo_pub/ocamlmaze/xMap.ml

nlucaroni 2010-10-29 19:08:45

I did not know about `include` thanks!

Niki Yoshiuchi 2010-10-29 19:30:57

... yeah, and in ocaml 3.12+, you can use include for the signatures as well (that's why I don't have a signature for that file). And, you've made me fix that code, there was an error when I tried to compile that project a little while ago. heh, thanks!

nlucaroni 2010-10-29 19:31:56

Answer 3

+2 A:

What is the density of potential next element ? What is the cost of your decide function ?

All your current solution have an O(n) cost. Fisher-Yates is O(n) (and it does not make much sense to try to adapt it for Enums, as it would require forcing the enumeration anyway), and Array.to_list alos is O(n).

If your decide function is fast enough and your density low enough, I think it may be simpler to just build a list/array of all eligible elements (calling decide on each element of the table), then randomly pick one of them.

If the density is high enough and decide costly, I think your first idea, picking keys at random and keeping a list of already-encountered keys. You will be able to pick the first eligible element encountered (optimal number of decide calls). This way to enumerate a sequence gets costly "in the end", when all elements have already been picked, but if your density is high you won't run into that case.

If you don't know, it may be interesting to start with the "high density" hypothesis, and change your mind once you've seen a given portion of the table, and still found nothing.

Finally: if you don't need to add/remove elements during the generation of your sequence, it would be interesting to convert your hashtable into an array once and forall (keeping another key -> array index table somewhere), as all such problems are simpler when the indexing is contiguous.

gasche 2010-10-29 18:25:25

@gasche Thanks for the very useful comments. I don't know. I am studying an unknown search space. The decide function doesn't have high costs and I suspect that the density of the next potential element will be quite low. I've now edited the question again to include a different random hash table enumeration module. It deals away with the costs of passing an array to a list and only uses the Fisher-Yates algorithm once at the start, so in the long run we can consider its complexity O(1). Have a read and let me know if you have any comments.

Surikator 2010-10-29 18:52:42

Answer 4

A:

Your implementations )(second and third) are too complicated. I don't like mutable and I don't like Enum. Combining them both is the best way to shoot yourself in the foot with uncontrolled side-effects.

I also think your particular problem is too specific to be solved by a generic-looking "shuffle something and that's it" function. Trying to find such a domain-independent function which also solves your domain-specific problem is maybe why your successive implementation get uglier and more complex at each attempt.

Producing a random stream from a Hashtable is simple : BatHashtbl.enum |- BatRandom.shuffle |- BatArray.enum. The rest of your code should concern the use of the decide function.

gasche 2010-10-29 19:13:18

@gasche I also didn't like `mutable` and `Enum`. I've now changed the implementation not to use them. I don't agree that the problem is too specific. The solution I propose above is for a general hash table and general decide function. Having this solution one can now plug in a particular hash table and a particular function and get a list of (key,value) from the hash table that has been obtained randomly. Thanks for the useful comments.

Surikator 2010-10-30 17:52:24

ansaurus

tags:

views:

answers:

Random enumeration of a hash table in OCaml

related questions