tags:

views:

118

answers:

3

Either it's late, or I've found a bug, or cast doesn't like colnames with "." in them. This all happens inside a function, but it "doesn't work" outside of a function as much as it doesn't work inside of it.

x <- structure(list(df.q6 = structure(c(1L, 1L, 1L, 11L, 11L, 9L, 
4L, 11L, 1L, 1L, 2L, 2L, 11L, 5L, 4L, 9L, 4L, 4L, 1L, 9L, 4L, 
10L, 1L, 11L, 9L), .Label = c("a", "b", "c", "d", "e", "f", "g", 
"h", "i", "j", "k"), class = "factor"), df.s5 = structure(c(4L, 
4L, 1L, 2L, 4L, 4L, 4L, 3L, 4L, 1L, 2L, 1L, 2L, 4L, 1L, 3L, 4L, 
2L, 2L, 4L, 4L, 4L, 2L, 2L, 1L), .Label = c("a", "b", "c", "d", 
"e"), class = "factor")), .Names = c("df.q6", "df.s5"), row.names = c(NA, 
25L), class = "data.frame")

cast(x, df.q6 + df.s5 ~., length)

No worky.

However, if:

colnames(x) <- c("variable", "value")
cast(x, variable + value ~., length)

Works like a charm.

+2  A: 

Nothing to do with the dots in the colnames (easily shown!).

If your dataframe doesnt have a column called 'value' then cast() guesses what column is the value - in this case it guesses 'df.s5' as it is the last column. This is what you get when you melt() data. It then renames that column to 'value' before calling reshape1. Now the column 'df.s5' is no more, yet it's there on the left of your formula. Uh oh.

You are using the value in the formula, which is an odd thing to do. None of the cast examples do that. What are you trying to do here?

You could add an ad-hoc column as a dummy value:

> cast(cbind(x,1), df.q6+s5~., length)

Using 1 as value column. Use the value argument to cast to override this choice

   df.q6 s5 (all)
1      a  a     2
2      a  b     2
3      a  d     3
4      b  a     1
5      b  b     1
[etc]

But I suspect there's a better way to get the number of repeated observations (rows) in a data frame - which is your real question!

Spacedman
Casting the data in this way makes it easy to make graphs like this: http://stackoverflow.com/questions/2578961/how-to-better-create-stacked-bar-graphs-with-multiple-variables-from-ggplot2/3784878#3784878
Brandon Bertelsen
Have you built a full solution to the question in the thread you linked to? stackoverflow.com/questions/2578961/
Jay
Yes. What you see there works
Brandon Bertelsen
Didn't get a chance to run through it, was curious if you wanted a labeler for the chart?
Jay
No labeller required, just needed this for fill=""
Brandon Bertelsen
+1  A: 

if you are looking for an easy solution, dcast in reshape2 package can help you:

library(reshape2)
dcast(x, df.q6 + df.s5 ~., length)
kohske
Thanks for the heads up on reshape2.
Brandon Bertelsen
+2  A: 

For me I use a similar solution to what Spacedman points out.

#take your data.frame x with it's two columns

#add a column
x$value <- 1

#apply your cast verbatim
cast(x, df.q6 + df.s5 ~., length)

   df.q6 df.s5 (all)
1      a     a     2
2      a     b     2
3      a     d     3
4      b     a     1
5      b     b     1
6      d     a     1
7      d     b     1
8      d     d     3
9      e     d     1
10     i     a     1
11     i     c     1
12     i     d     2
13     j     d     1
14     k     b     3
15     k     c     1
16     k     d     1

Hopefully that helps!

Jay

Jay
Strange mechanics, but it works.
Brandon Bertelsen
In essence it needs "something" to count, i.e. apply length() to. Because it's setup for the melt() output the variable name "value" works but so should anything else I think? The leftover variable is considered the "value" variable?
Jay
Also in plyr, perhaps like this? ddply(x, .(df.q6, df.s5), summarise, count=length(df.q6))
Jay