tags:

views:

268

answers:

8

What is the reason that vector indices in R start with 1, instead of the usual 0?

Example:

> arr<-c(10,20)
> arr[0]
numeric(0)
> arr[1]
[1] 10
> arr[2]
[1] 20

Is it just that they want to store extra information about the vector and didn't know where to store it except as the vector's first element?

+13  A: 

FORTRAN is one language that starts arrays at 1. Mathematicians deal with vectors that always start with component 1 and go through N. Linear algebra conventions start with row and column numbered 1 and go through N as well.

C started with zero because of the pointer arithmetic that was implicit underneath. Java, JavaScript, C++, and C# followed suit from C.

duffymo
Exactly. C's 0 indexing always seemed utterly reasonless to me until I learned a little bit about pointer arithmetic. Then it made sense as a design choice.
Sharpie
A: 

You're doing it wrong. If you want to store additional attributes in an object, use attr:

> foo <- 1:20
> attr(foo, "created") <- Sys.time()               # just as an example
> str(foo)
 atomic [1:20] 1 2 3 4 5 6 7 8 9 10 ...
 - attr(*, "created")= POSIXct[1:1], format: "2010-06-28 14:07:15"    # our time
> summary(foo)                                     # object works as usual
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
   1.00    5.75   10.50   10.50   15.20   20.00 
> 
Dirk Eddelbuettel
What am I doing wrong? I wasn't trying to store any additional information in my object.
Frank
I misread the last line of your question. To answer your question: R isn't C. That's all.
Dirk Eddelbuettel
+2  A: 

0 is only "usual" because that's what C did, and a lot of later languages slavishly copied C syntax. By default in Fortran arrays are 1-based.

In Ada there is no default and you have to pick the beginnning and end ranges. Interestingly, it seems that most code I've come across picks '1' for the lower bound. I think that's a pretty good indication of where folks would have gone given a free choice.

T.E.D.
+6  A: 

Vectors in math are often represented as n-tuples, elements of which are indexed from 1 to n. I suspect that r wanted to stay true to this notation.

Jan Gorzny
+2  A: 

R is a "platform for experimentation and research". Its aim is to enable "statisticians to use the full capabilities of such an environment" without rethinking the way they usually deal with statistics. So people use formulas to make regression models, and people start counting at 1.

wok
+1  A: 

Frank, I think you were misinterpreting what you saw when you typed arr[0]. The numeric(0) just means that the result is a numeric vector with no elements. It does not mean that the type of the vector is being "stored" in element 0. You would have gotten the same result if you had typed, for example, arr[arr > 30]. No element meets that condition, so the result vector has no elements. Likewise, no element has index 0. This is intentional, and has nothing to do with the 0 space being used for something else.

goodside
I think that is [what Dirk try to explain](http://stackoverflow.com/questions/3135325/why-do-vector-indices-in-r-start-with-1-instead-of-0/3135372#3135372) but you got the point. +1
Marek
A: 

The way this question is worded, it strikes me as the programming equivalent of the "Ugly American"

Pierreten
But why you post it as an answer and not as a comment?
Marek
+1  A: 

Actually, I think that the C like version that "start with 0" is very logical when you look at the way the memory is organized. In C we can write the following :

int* T = new int[10];

The first element of the array is *T. This is perfectly "logical" because *T is the adress of the first memory case pointed. The second element is the second case so *(T+1) : we move forward by one "sizeof(int)".

To make the code more readable, C implemented an alias : T[i] for *(T+i). To access the first element, you have to access *T that is T[0]. That's perfectly natural.

This idea is extended by iterators :

std::vector<int> T(10);
int val = *(T.begin()+3);

T[i] is just an alias for *(T.begin()+i).

In fortran/R, we usually start with 1 because of mathematical issues but there's certainly other good choices (cf this link for example). Do not forget that fortran can easily use array that start with 0 :

PROGRAM ZEROARRAY
REAL T(0:9)
T(0) = 3.14
END
Elenaher