ansaurus

Question

In R, What is the difference between df["x"] and df$x

Answer 1

+8 A:

If I'm not mistaken, df$x is the same as df[['x']]. [[ is used to select any single element, whereas [ returns a list of the selected elements. See also the language reference. I usually see that [[ is used for lists, [ for arrays and $ for getting a single column or element. If you need an expression (for example df[[name]] or df[,name]), then use the [ or [[ notation also. The [ notation is also used if multiple columns are selected. For example df[,c('name1', 'name2')]. I don't think there is a best-practices for this.

DiggyF 2010-07-30 06:22:32

Answer 2

+1 A:

df$x and df[[x]] do the same thing.

Let's assume that you have a data set named one. One of these variables is a factor variable, Region. Using one$Region will allow you to select a specific variable. Consider the following:

one <- read.csv("IED.csv")
one$Region

Running the following code also allows you to isolate that variable/level.

one[["Region"]]

Each code produces the following output:

> one$Region
    [1] RC SOUTH      RC SOUTH      RC SOUTH      RC EAST       RC EAST      
    [6] RC EAST       RC EAST       RC EAST       RC EAST       RC EAST      
   [11] RC SOUTH      RC SOUTH      RC EAST       RC EAST       RC EAST      
   [16] RC EAST       RC EAST       RC SOUTH      RC SOUTH      RC EAST      
   [21] RC SOUTH      RC EAST       RC CAPITAL    RC EAST       RC EAST 


> one[["Region"]]
    [1] RC SOUTH      RC SOUTH      RC SOUTH      RC EAST       RC EAST      
    [6] RC EAST       RC EAST       RC EAST       RC EAST       RC EAST      
   [11] RC SOUTH      RC SOUTH      RC EAST       RC EAST       RC EAST      
   [16] RC EAST       RC EAST       RC SOUTH      RC SOUTH      RC EAST      
   [21] RC SOUTH      RC EAST       RC CAPITAL    RC EAST       RC EAST

"They both return the "same" results, but not necessarily in the same format." - I didn't notice any differences. Each command produced the same outputs in the same format. Perhaps its your data.

Hope that helps.

EDIT:

Misread the original question. df["x"] produces the following:

> one["Region"]
             Region
1          RC SOUTH
2          RC SOUTH
3          RC SOUTH
4           RC EAST
5           RC EAST
6           RC EAST
7           RC EAST
8           RC EAST
9           RC EAST
10          RC EAST

Not sure why the difference occurs.

ATMathew 2010-07-30 12:48:30

You didn't notice any differences because you're looking at something slightly different than what he asked about. The question is about the difference between df$x and df["x"] (single brackets), but you're talking about df$x and df[["x"]] (DOUBLE brackets).

Fojtasek 2010-07-30 13:58:48

Answer 3

+7 A:

Another difference is that df$w returns NULL and df['w'] or df[['w']] gives an error with your example dataframe.

Henrico 2010-07-30 12:54:52

This is a crucial point.

Shane 2010-09-21 18:51:47

Answer 4

+4 A:

If you use df[,"x"] instead of df["x"] you will get the same result as df$x. The comma indicates that you're selecting a column by name.

Elaine 2010-07-30 15:00:17

Answer 5

+5 A:

In addition to the indexing page in the manual, you can find this succinct description on the help page ?"$":

Indexing by ‘[’ is similar to atomic vectors and selects a list of the specified element(s).

Both ‘[[’ and ‘$’ select a single element of the list. The main difference is that ‘$’ does not allow computed indices, whereas ‘[[’ does. ‘x$name’ is equivalent to ‘x[["name", exact = FALSE]]’. Also, the partial matching behavior of ‘[[’ can be controlled using the ‘exact’ argument.

The function calls are, of course, different. See get("[.data.frame") versus get("[[.data.frame") versus get("$")

jverzani 2010-07-30 19:02:57

Answer 6

A:

In this instance, for most uses, I'd avoid sub-setting altogether and trying to remember what $, [ ans [[ do with a data frame. I would just use with():

> df <- data.frame(x = 1:20, y = letters[1:20], z = 20:1)
> with(df, y)
 [1] a b c d e f g h i j k l m n o p q r s t
Levels: a b c d e f g h i j k l m n o p q r s t

That is a lot clearer than any of the sub-setting methods in most cases (IMHO).

Gavin Simpson 2010-09-21 18:32:06

ansaurus

tags:

views:

answers:

In R, What is the difference between df["x"] and df$x

related questions