tags:

views:

249

answers:

6

Where can I find information on the differences between calling on a column within a data.frame via:

df <- data.frame(x=1:20,y=letters[1:20],z=20:1)

df$x
df["x"]

They both return the "same" results, but not necessarily in the same format. Another thing that I've noticed is that df$x returns a list. Whereas df["x"] returns a data.frame.

EDIT: However, knowing which one to use in which situation has become a challenge. Is there a best practice here or does it really come down to knowing what the command or function requires? So far I've just been cycling through them if my function doesn't work at first (trial and error).

+8  A: 

If I'm not mistaken, df$x is the same as df[['x']]. [[ is used to select any single element, whereas [ returns a list of the selected elements. See also the language reference. I usually see that [[ is used for lists, [ for arrays and $ for getting a single column or element. If you need an expression (for example df[[name]] or df[,name]), then use the [ or [[ notation also. The [ notation is also used if multiple columns are selected. For example df[,c('name1', 'name2')]. I don't think there is a best-practices for this.

DiggyF
+1  A: 

df$x and df[[x]] do the same thing.

Let's assume that you have a data set named one. One of these variables is a factor variable, Region. Using one$Region will allow you to select a specific variable. Consider the following:

one <- read.csv("IED.csv")
one$Region

Running the following code also allows you to isolate that variable/level.

one[["Region"]]

Each code produces the following output:

> one$Region
    [1] RC SOUTH      RC SOUTH      RC SOUTH      RC EAST       RC EAST      
    [6] RC EAST       RC EAST       RC EAST       RC EAST       RC EAST      
   [11] RC SOUTH      RC SOUTH      RC EAST       RC EAST       RC EAST      
   [16] RC EAST       RC EAST       RC SOUTH      RC SOUTH      RC EAST      
   [21] RC SOUTH      RC EAST       RC CAPITAL    RC EAST       RC EAST 


> one[["Region"]]
    [1] RC SOUTH      RC SOUTH      RC SOUTH      RC EAST       RC EAST      
    [6] RC EAST       RC EAST       RC EAST       RC EAST       RC EAST      
   [11] RC SOUTH      RC SOUTH      RC EAST       RC EAST       RC EAST      
   [16] RC EAST       RC EAST       RC SOUTH      RC SOUTH      RC EAST      
   [21] RC SOUTH      RC EAST       RC CAPITAL    RC EAST       RC EAST 

"They both return the "same" results, but not necessarily in the same format." - I didn't notice any differences. Each command produced the same outputs in the same format. Perhaps its your data.

Hope that helps.

EDIT:

Misread the original question. df["x"] produces the following:

> one["Region"]
             Region
1          RC SOUTH
2          RC SOUTH
3          RC SOUTH
4           RC EAST
5           RC EAST
6           RC EAST
7           RC EAST
8           RC EAST
9           RC EAST
10          RC EAST

Not sure why the difference occurs.

ATMathew
You didn't notice any differences because you're looking at something slightly different than what he asked about. The question is about the difference between df$x and df["x"] (single brackets), but you're talking about df$x and df[["x"]] (DOUBLE brackets).
Fojtasek
+7  A: 

Another difference is that df$w returns NULL and df['w'] or df[['w']] gives an error with your example dataframe.

Henrico
This is a crucial point.
Shane
+4  A: 

If you use df[,"x"] instead of df["x"] you will get the same result as df$x. The comma indicates that you're selecting a column by name.

Elaine
+5  A: 

In addition to the indexing page in the manual, you can find this succinct description on the help page ?"$":

Indexing by ‘[’ is similar to atomic vectors and selects a list of the specified element(s).

Both ‘[[’ and ‘$’ select a single element of the list. The main difference is that ‘$’ does not allow computed indices, whereas ‘[[’ does. ‘x$name’ is equivalent to ‘x[["name", exact = FALSE]]’. Also, the partial matching behavior of ‘[[’ can be controlled using the ‘exact’ argument.

The function calls are, of course, different. See get("[.data.frame") versus get("[[.data.frame") versus get("$")

jverzani
A: 

In this instance, for most uses, I'd avoid sub-setting altogether and trying to remember what $, [ ans [[ do with a data frame. I would just use with():

> df <- data.frame(x = 1:20, y = letters[1:20], z = 20:1)
> with(df, y)
 [1] a b c d e f g h i j k l m n o p q r s t
Levels: a b c d e f g h i j k l m n o p q r s t

That is a lot clearer than any of the sub-setting methods in most cases (IMHO).

Gavin Simpson