tags:

views:

86

answers:

3

I have data of students from several schools. I want to show a histogram of the percentage of all students that passed the test in each school, using R. My data looks like this (id,school,passed/failed):

432342 school1 passed

454233 school2 failed

543245 school1 failed

etc'

(The point is that I am only interested in the percent of students that passed, obviously those that didn't passed have failed. I want to have one column for each school that shows the percent of the students in that school that passed)

Thanks

+2  A: 

there are many ways to do that. one is:

df<-data.frame(ID=sample(100),
school=factor(sample(3,100,TRUE),labels=c("School1","School2","School3")),
result=factor(sample(2,100,TRUE),labels=c("passed","failed")))

p<-aggregate(df$result=="passed"~school, mean, data=df)
barplot(p[,2]*100,names.arg=p[,1])
kohske
I get an error when I run your code. In particular, after **aggregate**: `Error in as.data.frame.default(x) : cannot coerce class "formula" into a data.frame`
csgillespie
Works for me in R-2.11.1.
Marek
@csgillespie, the aggregate is improved in R 2.11.1 to accept formula for its argument. This is actually great great improvement.
kohske
@kohske: Ahh, I'm using R 2.10.
csgillespie
+2  A: 

Using ggplot2 (very basic, can be improved):

library(ggplot2)
pass <- abs(100 * rnorm(10, sd = 2))
check <- sample(c("pass", "fail"), 10, replace = TRUE)
school <- c("school1", "school2", "school3", "school4", "school5", "school6", "school7", "school8", "school9", "school10")
dta <- data.frame(count, pass, school)
qplot(school, pass, data = dta, geom = "bar", stat = "identity")

clangon_ggplot2_barplot

Roman Luštrik
A: 

Since you have individual records (id) and want to calculate based on index (school) I would suggest tapply for this.

students <- 400
schools <- 5

df <- data.frame("id" = 1:students,
    "school" = sample(paste("school", 1:schools, sep = ""),
        size = students, replace = TRUE),
    "results" = sample(c("passed", "failed"),
        size = students, replace = TRUE, prob = c(.8, .2)))

p <- tapply(df$results == "passed", df$school, mean) * 100

barplot(p)
eyjo