views:

1006

answers:

11

The classic O(1) random access data structure is the array. But an array relies on the programming language being used supporting guaranteed continuous memory allocation (since the array relies on being able to take a simple offset of the base to find any element).

This means that the language must have semantics regarding whether or not memory is continuous, rather than leaving this as an implementation detail. Thus it could be desirable to have a data structure that has O(1) random access, yet doesn't rely on continuous storage.

Is there such a thing?

+2  A: 

Hashtable?

Edit: An array is O(1) lookup because a[i] is just syntactic sugar for *(a+i). In other words, to get O(1), you need either a direct pointer or an easily-calculated pointer to every element (along with a good-feeling that the memory you're about to lookup is for your program). In the absence of having a pointer to every element, it's not likely to have an easily-calculated pointer (and know the memory is reserved for you) without contiguous memory.

Of course, it's plausible (if terrible) to have a Hashtable implementation where each lookup's memory address is simply *(a + hash(i)) Not being done in an array, i.e. being dynamically created at the specified memory location if you have that sort of control.. the point is that the most efficient implementation is going to be an underlying array, but it's certainly plausible to take hits elsewhere to do a WTF implementation that still gets you constant-time lookup.

Edit2: My point is that an array relies on contiguous memory because it's syntactic sugar, but a Hashtable chooses an array because it's the best implementation method, not because it's required. Of course I must be reading the DailyWTF too much, since I'm imagining overloading C++'s array-index operator to also do it without contiguous memory in the same fashion..

Andrew Coleson
Most implementations still use contiguous storage.
Jasper Bekkers
Right, because it's easier, but there's no reason it Has to be contiguous, right?
Andrew Coleson
Can you implement an O(1) random access hash table without an array? Stated another way, a hash table to me seems to be a way of deciding where to throw elements into an array.
Joseph Garvin
Interesting edit, I hadn't thought of trading off continuous storage for being able to create entries at a specified memory location.
Joseph Garvin
As an avid DailyWTF reader, I simply recognize that there's more than one way to skin a cat, even if the cat doesn't need skinning. The problem with creating entries at a specified location is knowing whether you've allocated that particular memory location or not, though..
Andrew Coleson
I realize that it'd be a WTF'y thing to do, but from an academic standpoint I think it's interesting that there might be some equivalence between constant-size/guaranteed continuity/being able to allocate at a specific location.
Joseph Garvin
A: 

Distributed hash maps have such a property. Well, actually, not quite, basically a hash function tells you what storage bucket to look in, in there you'll probably need to rely on traditional hash maps. It doesn't completely cover your requirements, as the list containing the storage areas / nodes (in a distributed scenario), again, is usually a hash map (essentially making it a hash table of hash tables), although you could use some other algorithm, e.g. if the number of storage areas is known.

EDIT:
Forgot a little tid-bit, you'd probably want to use different hash functions for the different levels, otherwise you'll end up with a lot of similar hash-values within each storage area.

roe
+3  A: 

Apart from a hashtable, you can have a two-level array-of-arrays:

  • Store the first 10,000 element in the first sub-array
  • Store the next 10,000 element in the next sub-array
  • etc.
ChrisW
And that's how they do virtual memory in hardware, essentially.
Darius Bacon
And you can call that a radix tree (well, page tables _are_ a radix tree indeed)., even if not so flexible :-D
Blaisorblade
+5  A: 

How about a trie where the length of keys is limited to some contant K (for example, 4 bytes so you can use 32-bit integers as indices). Then lookup time will be O(K), i.e. O(1) with non-contiguous memory. Seems reasonable to me.

Recalling our complexity classes, don't forget that every big-O has a constant factor, i.e. O(n) + C, This approach will certainly have a much larger C than a real array.

EDIT: Actually, now that I think about it, it's O(K*A) where A is the size of the "alphabet". Each node has to have a list of up to A child nodes, which will have to be a linked list keep the implementation non-contiguous. But A is still constant, so it's still O(1).

Dave Ray
Sounds good to me. You'll probably have memory usage issues when using largish arrays, though.
strager
Totally. It's really "O(1) + C" afterall. Compared to a real array, I'm guessing C is a little bigger here :)Then again, it seems like it might work well for sparse arrays.
Dave Ray
+1  A: 

Surely what you're talking about here is not contiguous memory storage as such, but more the ability to index a containing data structure. It is common to internally implement a dynamic array or list as an array of pointers with the actual content of each element elsewhere in memory. There are a number of reasons for doing this - not least that it enables each entry to be a different size. As others have pointed out, most hashtable implementations also rely on indexing too. I can't think of a way to implement an O(1) algorithm that doesn't rely on indexing, but this implies contiguous memory for the index at least.

Stu Mackellar
A: 

It's possible to allocate a memory block not for the whole data, but only for a reference array to pieces of data. This brings dramatic increase decrease in length of necessary contiguous memory.

Another option, If the elements can be identified with keys and these keys can be uniquely mapped to the available memory locations, it is possible not to place all the objects contiguously, leaving spaces between them. This requires control over the memory allocation so you can still distribute free memory and relocate 2nd-priroty objects to somewhere else when you have to use that memory location for your 1st-priority object. They would still be contiguous in a sub-dimension, though.

Can I name a common data structure which answers your question? No.

Gorkem Pacaci
By "increase", I assume you mean decrease?Btw. c++ std::deque can be implemented by allocating small non-continuous chunks of continuous memory the way you describe.
Hrvoje Prgeša
That's right, increase->decrease. :)
Gorkem Pacaci
A: 

In practice, for small datasets using contiguous storage is not a problem, and for large datasets O(log(n)) is just as good as O(1); the constant factor is rather more important.

In fact, For REALLY large datasets, O(root3(n)) random access is the best you can get in a 3-dimensional physical universe.

Edit: Assuming log10 and the O(log(n)) algorithm being twice as fast as the O(1) one at a million elements, it will take a trillion elements for them to become even, and a quintillion for the O(1) algorithm to become twice as fast - rather more than even the biggest databases on earth have.

All current and foreseeable storage technologies require a certain physical space (let's call it v) to store each element of data. In a 3-dimensional universe, this means for n elements there is a minimum distance of root3(n*v*3/4/pi) between at least some of the elements and the place that does the lookup, because that's the radius of a sphere of volume n*v. And then, the speed of light gives a physical lower boundary of root3(n*v*3/4/pi)/c for the access time to those elements - and that's O(root3(n)), no matter what fancy algorithm you use.

Michael Borgwardt
Why for huge datasets O(log(n)) is just as good? Because the difference between it and a constant decreases over time?Also, could you explain the O(root3(n)) part? I'm guessing this has something do with the physics of wiring the actual circuits? Link?
Joseph Garvin
If each single bit takes constant space, volume scales as n**3. I suppose it's worth noting, though, that if you try to scale that up enough you'll end up with a black hole. :-)
Darius Bacon
+1  A: 

Aside from the obvious nested structures to finite depth noted by others, I'm not aware of a data structure with the properties you describe. I share others' opinions that with a well-designed logarithmic data structure, you can have non-contiguous memory with fast access times to any data that will fit in main memory.

I am aware of an interesting and closely related data structure:

  • Cedar ropes are immutable strings that provide logarithmic rather than constant-time access, but they do provide a constant-time concatenation operation and efficient insertion of characters. The paper is copyrighted but there seems to be a copy on the web at the moment.

This data structure is efficient enough that you can represent the entire contents of a large file using it, and the implementation is clever enough to keep bits on disk unless you need them.

Norman Ramsey
A: 

A bit of a curiosity: the hash trie saves space by interleaving in memory the key-arrays of trie nodes that happen not to collide. That is, if node 1 has keys A,B,D while node 2 has keys C,X,Y,Z, for example, then you can use the same contiguous storage for both nodes at once. It's generalized to different offsets and an arbitrary number of nodes; Knuth used this in his most-common-words program in Literate Programming.

So this gives O(1) access to the keys of any given node, without reserving contiguous storage to it, albeit using contiguous storage for all nodes collectively.

Darius Bacon
+1  A: 

Thus it could be desirable to have a data structure that has O(1) random access, yet doesn't rely on continuous storage.

Is there such a thing?

No, there is not. Sketch of proof:

If you have a limit on your continuous block size, then obviously you'll have to use indirection to get to your data items. Fixed depth of indirection with a limited block size gets you only a fixed-sized graph (although its size grows exponentially with the depth), so as your data set grows, the indirection depth will grow (only logarithmically, but not O(1)).

Rafał Dowgird
the OP never mentioned that the data structure has to be able to grow indefinitely
Jason S
@Jason - the O() notation must take infinite growth into account by definition. For a fixed data size, everything is O(1).
Rafał Dowgird
A: 

Some pseudo O(1) answers-

A VList is O(1) access (on average), and doesn't require that the whole of the data is contiguous, though it does require contiguous storage in small blocks. Other data structures based on numerical representations are also amortized O(1).

A numerical representation applies the same 'cheat' that a radix sort does, yielding an O(k) access structure - if there is another upper bound of the index, such as it being a 64 bit int, then a binary tree where each level correspond to a bit in the index takes a constant time. Of course, that constant k is greater than lnN for any N which can be used with the structure, so it's not likely to be a performance improvement (radix sort can get performance improvements if k is only a little greater than lnN and the implementation of the radix sort runs better exploits the platform).

If you use the same representation of a binary tree that is common in heap implementations, you end up back at an array.

Pete Kirkham