views:

66

answers:

4

Hi All,

I am somewhat new to R and I have run into a point where I need some help. I figure the reshape package can accomplish what I need to do.

Here is the structure of the original data frame:

> str(bruins)
'data.frame':   10 obs. of  6 variables:
 $ gameid  : Factor w/ 1 level "20090049": 1 1 1 1 1 1 1 1 1 1
 $ team    : chr  "NYI" "BOS" "NYI" "BOS" ...
 $ home_ind: chr  "V" "H" "V" "H" ...
 $ period  : Factor w/ 5 levels "1","2","3","4",..: 1 1 2 2 3 3 4 4 5 5
 $ goals   : int  0 0 3 0 0 3 0 0 3 3
 $ shots   : int  16 7 9 7 8 12 5 4 38 30

Here are the first few rows:

> head(bruins)
      gameid team home_ind period goals shots
409 20090049  NYI        V      1     0    16
410 20090049  BOS        H      1     0     7
411 20090049  NYI        V      2     3     9
412 20090049  BOS        H      2     0     7
413 20090049  NYI        V      3     0     8
414 20090049  BOS        H      3     3    12

I am looking to create a new data frame that pivots on gameid and period, with the rest of the columns summarizing the data for each home_ind row (10 columns in all).

When I run the following code:

b.melt <- melt(bruins, id=c("gameid", "period"), na.rm=TRUE)

I get the following error:

Warning messages:
1: In `[<-.factor`(`*tmp*`, ri, value = c(0L, 0L, 3L, 0L, 0L, 3L, 0L,  :
  invalid factor level, NAs generated
2: In `[<-.factor`(`*tmp*`, ri, value = c(16L, 7L, 9L, 7L, 8L, 12L,  :
  invalid factor level, NAs generated

Any help will be very much appreciated!

Edit: This is what I am hoping to get the restructured data to look like

    gameid period vis_team vis_goals vis_shots home_team home_goals home_shots
1 20090049      1     NYI      0      16       BOS          0          7
2 20090049      2     NYI      3      9        BOS          0          7
3 20090049      3     NYI      0      8        BOS          3         12
+2  A: 

since after melting, all measure variables will be in the same column, they should be of same type. In your case, "team" are character, "goals" are numeric, so you got that error.

Gary Lee
+2  A: 

I think you'd be better off using ddply from the plyr package for this problem. You didn't say how you wanted to summarise the data, but check out the summarise functions if you want to use a different summary function for each variable, or the colwise function if you want to summarise all variables the same way.

hadley
As always, thanks for the advice hadley. I can't quite get my head around what I would summarize. I edited the post above to highlight what I am hoping the new data frame will look like. I had previously tried using sqldf and almost got it but figure there must be an easier way with some of your packages.
Btibert3
A: 

Thanks for the help. I ended up going a different route and broke the problem into little pieces. I am sure this is quicker, more elegant way, but I got to where I needed to be and wanted to share the code in case this helps someone else.

## load libraries 
library(sqldf)

## assume that the dataset is loaded
## restructure the data and merge together
sql.1 <- "SELECT gameid, period, team `vis_team`, goals `vis_goals`, shots `vis_shots`"
sql.2 <- "FROM per WHERE home_ind='V' GROUP BY gameid, period "
sql.cmd <- paste(sql.1, sql.2, sep="")
vis <- sqldf(sql.cmd)

sql.1 <- "SELECT gameid, period, team `home_team`, goals `home_goals`, shots `home_shots`"
sql.2 <- "FROM per WHERE home_ind='H' GROUP BY gameid, period "
sql.cmd <- paste(sql.1, sql.2, sep="")
home <- sqldf(sql.cmd)

my.dataset <- merge(vis, home)
Btibert3
+1  A: 

Now I see what you're trying to do, here's an approach using summarise from plyr:

home <- summarise(subset(per, home_ind == "V"), 
  gameid = gameid, period = period, 
  vis_team = team, vis_goals = goals, vis_shots = shots)

away <- summarise(subset(per, home_ind == "H"), 
  gameid = gameid, period = period, 
  home_team = team, home_goals = goals, home_shots = shots)

join(home, away)

There are also a number of ways to do it using just base functions (e.g. by subsetting and then modifying names)

hadley