views:

254

answers:

5

I'm looking for a container that provides fastest unordered iterations through the encapsulated elements. In other words, "add once, iterate many times".

Is there one among OCaml's standard modules that is fast enough (such that further optimization of it would be useless)? Or some kind of third-party GPL-ready ones?

AFAIK there's just one OCaml compiler, so the concept of being fast is more or less clear...

...But after I saw a couple of answers, it appears, it's not. Of course, there's a plenty of data structures that allow O(n) iteration through container of size n. But the task I'm solving is one of those, where difference between O(n) and O(2n) matters ;-).

I also see that Arrays and Lists provide unnecessary information about the order of elements added, which I don't need. Maybe in "functional world" there exists data structures such that can trade this information for a bit of iteration speed.

In C I would outright pick a plain array. The question is, what should I pick in OCaml?

+1  A: 

All common data structures are iterable in O(n) time, so the differences between data structures will only be constant (and very probably not significant).

At least lists and arrays allow iteration without significant overhead. I can't think of a situation where that would not be fast enough.

sepp2k
+2  A: 

The array - a linear piece of memory with the items visited in sequential order - best utilises the CPU's L1 data cache.

Will
It was true in C... is it still the fastest in OCaml?
Pavel Shved
If it's an unboxed datatype (e.g., integers), the array values will be stored in a contiguous block of memory. If it's a "boxed" datatype (most are), then it will be an array of pointers, so you probably won't gain much over a list.
Chris Conway
+7  A: 

You are unlikely to do better than built-in arrays and lists, since they are hand-coded in C, unless you bind to your own native implementation of an iterator. An array will behave almost exactly like an array in C (a contiguously allocated block of memory containing a sequence of element values), possibly with some extra pointer indirections due to boxing. List are implemented exactly how you would expect: as cells with a value and a "next" pointer. Arrays will give you the best locality for unboxed types (especially floats, which have a super-special unboxed implementation).

For information about the implementation of arrays and lists, see Section 18.3 of the OCaml manual and the files byterun/mlvalues.h, byterun/array.c, and byterun/alloc.c in the OCaml source code.

From the questioner: indeed, Array appeared to be the fastest solution. However it only outperformed List by 7%. Maybe it was because the type of an array element was not plain enough: it was an algebraic type. Hashtbl performed 4 times worse, as expected.

So, I will pick Array and I'm accepting this one. good.

Chris Conway
+8  A: 

To know for sure, you're going to have to measure. Based on the machine instructions the compiler is likely to generate, I would try an array, then a list.

  • Access to an array element requires a bounds check, address arithmetic, and a load

  • Access to the head of a list requires a load, a test for empty list, and a load at a known compile-time offset.

The details of which is faster probably depend on your application and what else is happening on your machine. They also depend on the type of elements; for example, if they are floating-point numbers, ocamlopt may be clever enough to make an unboxed array, which will save you a level of indirection.

Other common data structures like hash tables or balanced trees generally require that you allocate some context somewhere to keep track of where you are. With an array, keeping track requires only an integer index; with a list, keeping track requires a single pointer. I think this is going to be hard to beat in another data structure.

Finally please note that there may be only one OCaml compiler, but it has two back ends: bytecode and native code. Naturally if you care about this level of performance, you are using the native-code ocamlopt version. Right?

Please take measurements and edit the results into your question.

Norman Ramsey
+4  A: 

Don't forget about Bigarrays, they are most close to C arrays (just a flat piece of memory), but cannot contain arbitrary OCaml values. Also consider switching bounds checking off (unsafe_set/get). And of course you should profile first.

ygrek