33936

27
+90  Q:

## Arrays, what's the point?

As I'm programming I haven't seen an instance where an array is better for storing information than another form thereof. I had indeed figured the added "features" in programming languages had improved upon this and by that replaced them. I see now that they aren't replaced but rather given new life, so to speak.

So, basically, what's the point of using them?

I see then, thank you very much for the clarification. I've got a way to go with this, I do appreciate all the answers.

EDIT::
The question was not so much why do we use arrays from a computer standpoint, but rather why would we use arrays from a programming standpoint (a subtle difference). What the computer does with the array was not the point of the question.

A:

It would be challenging to implement a Quicksort without using arrays.

I assume by "array", you mean "randomly accessible sequential data structure".

You can implement Quicksort with lists just fine.
But it would be challenging. Have you ever sorted a linked list without using an array?
I'm not sure what you think would be "challenging". Perhaps you think the "swap" operation is fundamental? There are several list implementations at http://en.wikibooks.org/wiki/Algorithm_implementation/Sorting/Quicksort
+2  A:

Arrays are used everywhere in computer science. What are you comparing an array to? A linked list? Arrays have much faster random access speeds than linked lists, especially as the lists get long. They also make much more efficient use of memory.

+1  A:

How else would you store a "list" of 10 numbers where access time is important?

I.E. I need to read the n-h element of my list and write the n+1-th element several thousand times per second?

Also, the natural layout of system memory lends itself well to arrays of values which are directly adjacent in a contgiuous block of memory. "Instances" and such tend to be allocated individually, and allocating many small pieces of memory typically wastes memory (since the allocation is often word or cache line aligned, where cache lines can be 128 bytes or more) and causes problems for the allocator when it needs to allocate a large chunk because of "fragmentation".

+11  A:

Arrays are fundamental data structures that are useful for building many different abstract data types. Even if your language provides you with structures such as stacks, queues, lists, etc. they may internally use arrays to implement these structures.

+59  A:

For O(1) random access, which can not be beaten.

Would you care to elaborate please?
On which point? What is O(1)? What is random access? Why can't it be beaten? Another point?
O(1) random access, is this the O(N) as described in Holland's reply?
O(1) means constant time, for example if you want to get the n-esim element of an array, you just access it directly through its indexer (array[n-1]), with a linked list for example, you have to find the head, and then go to the next node sequentially n-1 times which is O(n), linear time.
Big-O notation describes how the speed of an algorithm varies based on the size of its input. An O(n) algorithm will take twiceish as long to run with twice as many items and 8ish times as long to run with 8 times as many items. In other words the speed of an O(n) algorithm varies with the [cont...]
size of its input. O(1) implies that the size of the input ('n') doesn't factor into the speed of the algorithm, it's a constant speed regardless of the input size
O(1) random access means that it, for any k, it takes a constant amount of time to access the kth element of array, regardless of the length N of the array.When Holland refers to O(N) he is referring to the time it takes to search for a specific element in an array, assuming it is in random order.
Okay, got it thanks.
I see your O(1), and raise you O(0).
I do appreciate answers that satisfy the original question with the least amount of words. But if I understand correctly then we are talking about C arrays, not C++ vectors or Java ArrayLists. So you should add that their size is constant and specified at compile time.
I mean, there are the kinds of arrays that grow dynamically, but they are a completely different kind of beast than the C arrays. C arrays don't have those obscure expensive operations in the background you aren't aware of, rare as they may be.
C arrays, when compared to dynamically growing arrays just block you from exposing yourself to expensive implicit growth. Dynamic arrays have advantages in every other respect, but they may be occasionally expensive on insertions if you are not aware of the implicit growth feature.
+22  A:

Perhaps you're not aware of it, but most collection classes use arrays as their base storage mechanism...

An example in .NET terms is the ArrayList:

``````public class ArrayList : IList, ICollection, IEnumerable, ICloneable
{
// Fields
private const int _defaultCapacity = 4;
private object[] _items; // <<<<<<<<<<<<<<<<<<<<<<<<<<< See this array?
private int _size;
[NonSerialized]
private object _syncRoot;
private int _version;
``````

Now, we look at a typical typed collection class, StringCollection:

``````public class StringCollection : IList, ICollection, IEnumerable
{
// Fields
private ArrayList data; // <<<<<<<<< Stores its items internally in an ArrayList
``````

The same occurs for a generic List:

``````public class List<T> : IList<T>, ICollection<T>, IEnumerable<T>, IList, ICollection, IEnumerable
{
// Fields
private const int _defaultCapacity = 4;
private static T[] _emptyArray;
private T[] _items; // <<<<<<<<<<<<<<<<<<<<< See the array?
private int _size;
[NonSerialized]
private object _syncRoot;
private int _version;
``````

I get the impression from your question that you think all these newfangled ways of storing collections of items have replaced the array, but in fact, they have not. They supplement it greatly.

We get that list classes use array internally, but that says that for normal use arrays not to be used, and are not much more than an implementation detail.
+1  A:

Besides performance, etc., one thing that makes arrays indispensable is the fact that all languages (I don't know any which does not) have an implementation of arrays. Thus from an interoperability stand point you can not beat arrays. Most public webservices use arrays to return sets of data as it's something almost any language can understand.

Second Life's LSL doesn't have arrays, just lists. LSL is actually Turing-complete and a surprisingly powerful in many ways - latest versions also compile to mono.
+447  A:

Time to go back in time for a lesson. While we don't think about these things much in our fancy managed languages today, they are built on the same foundation, so let's look at how memory is managed in C.

Before I dive in, a quick explanation of what the term "pointer" means. A pointer is simply a variable that "points" to a location in memory. It doesn't contain the actual value at this area of memory, it contains the memory address to it. Think oOkay, insertion time is somewhat slower when inserting into an array than into a tree. But if you use pointer-arrays and the elements itself are big enough then you should get the best compromise in comparison of look-up time, random-access and insertion time. Think of a block of memory as a mailbox. The pointer would be the address to that mailbox.

In C, an array is simply a pointer with an offset, the offset specifies how far in memory to look. This provides O(1) access time.

``````  MyArray   [5]
^       ^
Pointer  Offset
``````

All other data structures either build upon this, or do not use adjacent memory for storage, resulting in poor random access look up time (Though there are other benefits to not using sequential memory).

For example, let's say we have an array with 6 numbers (6,4,2,3,1,5) in it, in memory it would look like this:

``````=====================================
|  6  |  4  |  2  |  3  |  1  |  5  |
=====================================
``````

In an array, we know that each element is next to each other in memory. A C array (Called MyArray here) is simply a pointer to the first element:

``````=====================================
|  6  |  4  |  2  |  3  |  1  |  5  |
=====================================
^
MyArray
``````

If we wanted to look up MyArray[4], internally it would be accessed like this:

``````   0     1     2     3     4
=====================================
|  6  |  4  |  2  |  3  |  1  |  5  |
=====================================
^
MyArray + 4 ---------------/
(Pointer + Offset)
``````

Because we can directly access any element in the array by adding the offset to the pointer, we can look up any element in the same amount of time, regardless of the size of the array. This means that getting MyArray[1000] would take the same amount of time as getting MyArray[5].

An alternative data structure is a linked list. This is a linear list of pointers, each pointing to the next node

``````========    ========    ========    ========    ========
| Data |    | Data |    | Data |    | Data |    | Data |
|      | -> |      | -> |      | -> |      | -> |      |
|  P1  |    |  P2  |    |  P3  |    |  P4  |    |  P5  |
========    ========    ========    ========    ========

P(X) stands for Pointer to next node.
``````

Note that I made each "node" into its own block. This is because they are not guaranteed to be (and most likely won't be) adjacent in memory.

If I want to access P3, I can't directly access it, because I don't know where it is in memory. All I know is where the root (P1) is, so instead I have to start at P1, and follow each pointer to the desired node.

This is a O(N) look up time (The look up cost increases as each element is added). It is much more expensive to get to P1000 compared to getting to P4.

Higher level data structures, such as hashtables, stacks and queues, all may use an array (or multiple arrays) internally, while Linked Lists and Binary Trees usually use nodes and pointers.

You might wonder why anyone would use a data structure that requires linear traversal to look up a value instead of just using an array, but they have their uses.

Take our array again. This time, I want to find the array element that holds the value '5'.

``````=====================================
|  6  |  4  |  2  |  3  |  1  |  5  |
=====================================
^     ^     ^     ^     ^   FOUND!
``````

In this situation, I don't know what offset to add to the pointer to find it, so I have to start at 0, and work my way up until I find it. This means I have to perform 6 checks.

Because of this, searching for a value in an array is considered O(N). The cost of searching increases as the array gets larger.

Remember up above where I said that sometimes using a non sequential data structure can have advantages? Searching for data is one of these advantages and one of the best examples is the Binary Tree.

A Binary Tree is a data structure similar to a linked list, however instead of linking to a single node, each node can link to two children nodes.

``````         ==========
|  Root  |
==========
/          \
=========       =========
| Child |       | Child |
=========       =========
/       \
=========    =========
| Child |    | Child |
=========    =========

Assume that each connector is really a Pointer
``````

When data is inserted into a binary tree, it uses a several rules to decide where to place the new node. The basic concept is that if the new value is greater than the parents, it inserts it to the left, if it is lower, it inserts it to the right.

This means that the values in a binary tree could look like this:

``````         ==========
|   100  |
==========
/          \
=========       =========
|  200  |       |   50  |
=========       =========
/       \
=========    =========
|   75  |    |   25  |
=========    =========
``````

When searching a binary tree for the value of 75, we only need to visit 3 nodes because of this structure:

• Is 75 less than 100? Look at Right Node
• Is 75 greater than 50? Look at Left Node
• There is the 75!

Even though there is 5 nodes in our tree, we did not need to look at the remaining two, because we knew that they (and their children) could not possibly contain the value we were looking for. This gives us a search time that at worst case means we have to visit every node, but in the best case we only have to visit a small portion of the nodes.

That is where arrays get beat, they provide a constant O(N) search time, despite O(1) access time.

This is an incredibly high level overview on data structures in memory, skipping over a lot of details, but hopefully it illustrates an array's strength and weakness compared to other data structures.

+1 for the time and thought.
Your C example is off by one, MyArray[4] should be pointing to the 5th element in your diagram, not the 4th element because arrays are zero-indexed in C. Otherwise, very nice explanation.
Good catch, corrected
@Jonathan: You updated the diagram to point to the 5th element but you also changed MyArray[4] to MyArray[5] so it is still incorrect, change the index back to 4 and keep the diagram as-is and you should be good.
Very well explained even with the magic of C!
@lubos hasko - careful you don't trigger the fraudulent voting detector
Excellent explanation.
+1, nice explanation
Great answer clear and concise, i even learnt a little about traversing a binary tree!
+1: Congratulations for a polished exposition.
This is what bugs me about "community wiki" this post is worth "proper" rep
+1: very very nice answer! thank you.
I'm really sorry you won't get any rep for this answer, you should. +1 from me too
+1 verry nice answer that cover the "why is there so much different container types" question that comes up in C++ when teaching about the STL containers...
+1 Very Well Done!
+1; IMO, its answers like this that makes SO so good! Great work
This was a very nice answer, very well thought out and explained. I'm sure I am not the only one who has benefited from this. Very simply, great work.
WOW!!! did you really write this whole explanation?, your answer is too much for this question!
This is the kind of answer I wish I would have had in my first book on C-programming. +1
Nice answer. But the tree you describe is a binary search tree - a binary tree is just a tree where every node has at most two children. You can have a binary tree with the elements in any order. The binary search tree is organized as you describe.
Good explanation, but I can't help to nitpick... if you are allowed to reorder the items into a binary search tree, why can't you reorder the elements in the array so a binary search would work in it, too? You might go into more detail regarding O(n) insert/delete for a tree, but O(n) for an array.
Alex B seem to have messed up the arrows. after his edit, the arrows have looked strange
Yes, someone broke all the graphics on this post...
+1 for nice ascii art.
+1 for the time and effort on a great post.
@Mark Santesson: And an ordered array never gets unbalanced! Okay, insertion time is somewhat slower when inserting into an array than into a tree. But if you use pointer-arrays if the elements intself are big then you should get the best compromise in comparison of look-up time, random-access and insertion time.
Very well thought out. Thanks.
Since you were originally comparing arrays with linked lists, I would've stayed with that comparison - lists have constant-time prepend operations, for instance.
+17  A:

Not all programs do the same thing or run on the same hardware.

This is usually the answer why various language features exist. Arrays are a core computer science concept. Replacing arrays with lists/matrices/vectors/whatever advanced data structure would severely impact performance, and be downright impracticable in a number of systems. There are any number of cases where using one of these "advanced" data collection objects should be used because of the program in question.

In business programming (which most of us do), we can target hardware that is relatively powerful. Using a List in C# or Vector in Java is the right choice to make in these situations because these structures allow the developer to accomplish the goals faster, which in turn allows this type of software to be more featured.

When writing embedded software or an operating system an array may often be the better choice. While an array offers less functionality, it takes up less RAM, and the compiler can optimize code more efficiently for look-ups into arrays.

I am sure I am leaving out a number of the benefits for these cases, but I hope you get the point.

Ironically, in Java you should use an ArrayList (or a LinkedList) instead of a Vector. This is to do with a vector being synchronised which is usually unnecessary overhead.
+3  A:

Say you have a series of buckets each tied to the next by a piece of rope, and you're holding onto a piece of rope attached to the first bucket, but you want the contents of the 42nd bucket. You'll have to follow the ropes to 42 buckets before you get to the one you want.

This is like a "linked list". The buckets are memory locations that store the value, and the pieces of rope are the pointers to the next bucket. The lookup time for a random access like this is considered O(N) because it takes on the order of N "operations" to get there. As the size of the list increases, so does the lookup time linearly (i.e. linear time).

Now say you have a series of buckets spaced exactly 1 foot apart, as well as a really long ruler with markings every 1 foot, and you want to get to the 42nd bucket. Just go directly to the spot on the ruler marked 42 and you're there!

That's like an "array". Again, the buckets are the memory locations containing the values, but this time since they're in a straight line evenly spaced you can just jump directly to the offset (think of the memory addresses as the "ruler"). The lookup time for a random access is much faster for lots of buckets, only taking a constant number of operations (jumping directly to the right spot), called O(1). As the size of the array increases, the lookup time stays constant (i.e. constant time)

I like the bucket analogy.
+3  A:

Just put 100,000 items in the data structure of your choice, and another 100,000 in an array. Then, see how long it takes your program to get to the 93,754 item in both cases.

how often do you need such a case? ;)
I'd take a guess as to not too often for "most" people. I also think that it wasn't so much of a "You are going to need this" as "See the value of the array in terms of speed."
@Mamut - Are you asking how often do you need efficiency? I personally like it all the time. I wish more programmers did.
+3  A:

I'm surprised nobody has mentioned that other fundamental data type that is built on an array: Strings.

I C/C++ - perhaps.I believe, this could be nt the case for other languages.
It could be the case, but isn't. ;-)
+2  A:

As some other people said, Strings are usually arrays. Another important use of arrays is cache locality--modern processors use a cache, that is, a special kind of memory much smaller than the main ram in the computer. Since arrays are adjacent in memory, they can completely loaded into the cache, unlike many of the linked structures. This makes them much faster in practice, because walking across elements in a linked list can result in cache misses (accessing data not in the cache), which is hundreds or thousands of times slower.

+2  A:

Don't forget about cache locality. Most architectures "like" sequential accesses. With arrays you can make fairly easy optimizations around prefetching and flushing caches.

Performance-sensitive applictions such as video games basically live or die based on their prefetching and efficient use of cache lines these days. I'd hate to do this with any other data structure.

A:

How then would you store the pixels of an image, if not as a 2-dimensional array?

Or the characters in a string?

One might argue that an array is simply a specialized Map where the domain is restricted to an integer interval, but many data structures, arrays are an obvious implementation.

In order for this question to make any sense, I think one has to take "an array" to be contiguously allocated block of storage.
A:

The binary tree in the example is only faster because it is sorted, if the array were sorted the only way a binary tree could beat it for search time is to be perfectly balanced.

If the array is sorted you only have to visit the middle node, then search the half of the array that your value is in, over and over.

Binary trees sacrifice insertion speed for search speed. If you insert sorted data into a binary tree, you end up with a linked list. That is why real implementations of binary tree's are typically Red/Black Tree's. These balance while inserting to keep lookup optimal.
+2  A:

Arrays are useful constant time, O(1), random access by index. Link lists take linear time, O(n), to access a given element, and are therefore slower in this respect.

I downvoted you twice today. Now i found something to upvote. Dont ask why. I am to lazy to explain.
+1  A:

If your application processes data sequentially, then an array places data into sequential memory addresses which guarantees decent caching performance. The difference in speed between iterating through the elements of a linked list (whose elements may be scattered around in memory) and an array can be a factor of 10-20x.

+1  A:

A main feature of the different types of collection is their asymptotic performance, the cost in terms of cpu cycles or memory consumed as the number of elements increases. Almost as important is the marginal cost when the size of the collection is very small.

At the large end it is very difficult to know before hand which type of structure will work best. In the case that your application is mostly doing reads or replaces, arrays often win, but when your application has to do a lot of inserts, linked lists or balanced trees are often preferable. In either case, no structure has better performance in terms of memory than arrays, because the only data stored is the values, not metadata related to the structure. This can be dominant when your application is very IO or cache dependant.

On the other side, when the size of the collection is small, arrays are often a clear winner because there is very little overhead for most operations. Linear scans can even be faster than binary searches in sorted structures because the entire collection might fit in a single cache entry.

These days it often makes sense to start by choosing any structure without regard to performance in the initial development phase. After the application is beginning to mature and you (or your customer) is starting to experience bottlenecks, then you can try to optimize your data structures, with the help of a profiler and a real world workload.

+2  A:

Arrays are a simple concept. Why would I not want a simple datatype that I can visualize perfectly and use as a simple foundation for other datatypes? Simplicity is self explanatory

+7  A:

## Perl 5 Array

`Perl`'s arrays aren't like arrays from other languages. The memory for them is pre-allocated.

Note: this is just a simulation of how `Perl` arrays act.

``````Use Data::Dump 'dump';
my @array = qw( 1 2 3 4 5 );
``````

``` fill   max   ptr        0   1   2   3   4   5   6   7
+-----+-----+-----+    +---+---+---+---+---+---+---+---+---
|   5 |  32 | ### |    | 1 | 2 | 3 | 4 | 5 | ? | ? | ? |
+-----+-----+-----+    +---+---+---+---+---+---+---+---+---
|                   |
\                 /
Valid
```
``````say dump \$array[\$_] for 0..5; # 5 being off the end of the array
``````
```1
2
3
4
5
undef
```
``````push @array, 6;
``````
``` fill   max   ptr        0   1   2   3   4   5   6   7
+-----+-----+-----+    +---+---+---+---+---+---+---+---+---
|   6 |  32 | ### |    | 1 | 2 | 3 | 4 | 5 | 6 | ? | ? |
+-----+-----+-----+    +---+---+---+---+---+---+---+---+---
```
``````\$array[10] = 11;
``````
``` fill   max   ptr        0   1   2   3   4   5   6   7   8   9  10
+-----+-----+-----+    +---+---+---+---+---+---+---+---+---+---+---+---+---
|  10 |  32 | ### |    | 1 | 2 | 3 | 4 | 5 | 6 | U | U | U | U | 11| ? |
+-----+-----+-----+    +---+---+---+---+---+---+---+---+---+---+---+---+---
```
+5  A:

There are two kinds of programming (well, ok, lots of kinds, but two that matter in this case): high-performance programming and regular programming.

In the high-performance case you need to know what kind of memory access you will be using and you care significantly about things like the cpu's cache and things like that. In this case you will often use arrays directly; this is the case when you are doing things like scientific computing, or developing basic features such as a generic collection implementation.

In the normal case, such as most business programming or web programming, or much programming that takes place in a VM, your main focus will be on application correctness and overall performance, often including things like database access. In this case you usually won't use an array directly, and instead should use a container such as Java's List. Now, in Java, the List is actually just an interface and you can choose from several Lists, including the ArrayList, or the LinkedList. But your code doesn't care which kind you use, and as the programmer you only have to decide up front to use one or the other. The main body of code that does all the work won't know if you chose Linked or Array. And in many many cases it won't even matter, since it's very common for lists to be only appended or only iterated. You might also use Maps, which are conceptually like arrays but the keys are anything instead of just sequential numbers.

Many of the other answers to this question say "how would you implement Strings without Arrays?!" or things like that. But Strings need not be arrays; in Java they are not (well, more precisely you don't have access to the character array). Sure, underneath the hood there's an array, but the business-logic programmer doesn't need to care. Only the guy at Sun who writes the String class cares.

Which guy are you? I am the business programmer, so I rarely use arrays. Most of the time I use arrays because Java uses arrays for some of its API calls or because I have a function that returns two values and I can't be bothered to create a class or a map for it. Or I need a temporary container for some bytes read from some stream. But 99.9% of my code is array-free.

A:

Another unknown feature of arrays is the way they deal with variance / contravariance :

``````interface ICustomer{ }
class Customer : ICustomer { }

public ICustomer[] f()
{
return new Customer[] { };
}
``````

The compiler accepts this code; which means that a Customer[] is an ICustomer[] (and an IList`<Customer`> and an IList`<ICustomer`>, too).
It never proved useful to me, though.

+1  A:

Some platforms, mobile applications using J2ME and BREW for example, only have array access due to hardware limitations. No STL or Container classes available, so you end up having to create your own data structures using arrays.

+2  A:

I understand the question a bit differently.

Taking a single language as an example, in C++, why would you ever use an array instead of an std::vector?

Sure, the vector, after compilation, will be an array perhaps with various guarantees, checks and balances, but that's how I understand Xesaniel's question.

He's not asking why you would ever use an array in the underlying implementation, but why would you ever use an array up at the top abstraction level when writing your program?

Why not always at least use an std::vector if you are in C++? In some other language use their resources for variably managed collections. If your language doesn't provide an abstraction like that, then write one and use it. In the case of an std::vector, performance isn't an issue because when all's said and done, your compilation will contain an array.

Again, it will "be" an array at the end of its compilation lifecycle, but I think you should always use an abstraction level appropriate for what you are doing. In other words, get in the habit of using something other than an array.

Ride on the shoulders of Giants, people!

I think we're *standing* on the shoulders of giants, but riding sounds like a lot more fun :)
A:

Xesaniel, the more I read code (and I read code a lot), the less I see arrays in it. I can't even remember the last time I've see one.

A:

Maybe just because they are the first thing comes to mind when we want to store a "collection" of items in a "set".

Maybe they are the oldest structure in programming languages to store a data collection, in a naturally sorted way (i.e 1-1 correspondence with positive integers).

All other structures are some forms and diversions of arrays.

If your question is why arrays are the fist thing in mind when we want to store a collection? The question may change to "why humans count?", "How we categorise?", "Is there a definition for a set?" ...

(Hope I got you correct and skipped the computing/programming aspect)

btw "Hello, Stack Overflow"