views:

102

answers:

6

I am currently developing a programming language in C, and I want to allow users to create apparently "unlimited" arrays with numerical indices without sacrificing performance in the process. For example, table [1000000000] would ideally be creatable and accessible in an instant without the memory overhead of a table of 1,000,000,000 items, 999,999,999 of which were unused; but the array would also perform well when table [n] was defined for, say, 1 ≤ n ≤ 1000000.

Do you have suggestions for the implementation of such an array-handling system?

A: 

I think you've answered it yourself. Take a look at CMPH - C Minimal Perfect Hashing Library.

EDIT:

Or you could use a B+ Tree to map the integer to the real index in the array. Using B Trees has another advantage: you can make range queries.

the_void
You already have a perfect hash function, in this case it is the index.
Hasturkun
Doesn't a perfect hash function require that one know the keys in advance (for example, mapping the months January...December to 1...12)?
Thomas Larsen
A: 

How about using pointer, you don't have to define number of elements for it, you can add as many elements as you want

vodkhang
A: 

You're creating a Sparse Array, as the Wikipedia article mentions, these can be represented by a linked list.

Each node in the linked list may be a dynamically allocated array so that you don't suffer excessive overhead for consecutive indices.

Hasturkun
Sparse arrays may be more inefficient, with a `get/set` complexity of `O(N)` - `N` number of actual items (http://www.itl.nist.gov/div897/sqg/dads/HTML/hugeSparseArray.html)
the_void
Why the downvote? as far as I can tell this _is_ a sparse array, and I was not suggesting the implementation linked to by @the_void, rather as a linked list of arrays that may be unified over time
Hasturkun
A: 

Theoretically I think it is possible. What you need is very good hashing algorithm (to avoid collisions). So if anyone says table[100..0]; you need not to allocate the space at once. Allocate space on need basis. So if in the table[100..0] I am trying to populate first 5 values then I will be storing those five values only and if i try to access lets say table[100] then I should be getting something like 'undef' or 'nil' ....

the library mentioned by the_void seems good... though i haven't tested ... :)

Favonius
+1  A: 

There's Judy Arrays http://judy.sourceforge.net/

Spudd86
A: 

Using cmph won't help. You need to know all keys in advance to create a (minimal) perfect hash function.

What you want is a simple associative mapping structure that will let you implement a sparse array. Any hash table or tree structure will do. You can use hash_map or map out of the box from your c++ stl implementation or any similar data structure.

If you want to go fancy, you can use Judy Arrays, but I will doubt it will make any difference unless you can properly benchmark stuff and are willing to consider more complex data structures that will do assumptions on your particular use case.

Do the simple thing. The easiest available hash table available for you is the best answer. Don't even bother thinking about hash functions or such, whatever your platform provides will work well enough.

Davi