tags:

views:

398

answers:

5

Hello,

I want an stacked barplot, or at least two barplots (histogramms) of the data below. But I cant't figure out how. plot(online) is not the solution, I´m looking for. Please see below.

          online              offline
1         sehrwichtig             wichtig
2             wichtig           unwichtig
3         sehrwichtig           unwichtig
4         sehrwichtig         sehrwichtig
5         sehrwichtig         sehrwichtig
6         sehrwichtig           unwichtig
7         sehrwichtig           unwichtig
8             wichtig             wichtig
9             wichtig           unwichtig
10        sehrwichtig         sehrwichtig
11        sehrwichtig             wichtig
12        sehrwichtig           unwichtig
13            wichtig         sehrwichtig
14        sehrwichtig             wichtig

I know I need a step, where the data is aggregated to:

                   online        offline 
   sehrwichtig           6         7 
   unwichtig             0         1 
   wichtig               3         5 

But how?

A: 

I haven't done this myself, but I do know of the R package that a lot of people use for the step of putting it in the second array there. It's called reshape:

http://www.statmethods.net/management/reshape.html

http://had.co.nz/reshape/introduction.pdf

As for the plotting part, I think that lattice or ggplot probably both have functions for doing exactly what you want, but again I am an R newbie so I can't say much more...

rescdsk
Thanks for the page http://www.statmethods.net/.
Felix
+3  A: 

That aggregation is just a simple call to table inside of apply:

R> foo <- data.frame(online=sample(c("S","W","U"),10,TRUE), 
                     offline=sample(c("S","W","U"),10,TRUE))
R> apply(foo,2,table)
  online offline
S      3       1
U      4       5
W      3       4

which you can feed into barplot.

Dirk Eddelbuettel
I accepted these solution for the sake of simplicity. Thanks to everybody else. I learned a lot.
Felix
+1  A: 

Dirk's answer is the way to go, but on the OP's data a simple apply(foo,2,table) won't work -- you need to deal with the 0 entry, perhaps like so:

my.data <- "online              offline
1         sehrwichtig             wichtig
2             wichtig           unwichtig
3         sehrwichtig           unwichtig
4         sehrwichtig         sehrwichtig
5         sehrwichtig         sehrwichtig
6         sehrwichtig           unwichtig
7         sehrwichtig           unwichtig
8             wichtig             wichtig
9             wichtig           unwichtig
10        sehrwichtig         sehrwichtig
11        sehrwichtig             wichtig
12        sehrwichtig           unwichtig
13            wichtig         sehrwichtig
14        sehrwichtig             wichtig"

df <- read.table(textConnection(my.data))

df.labels <- unique(as.character(apply(df,2,as.character)))
tallies <- apply(df,2,function(x)table(x)[df.labels])
tallies[is.na(tallies)] <- 0
rownames(tallies) <- df.labels

For brevity's sake, you could combine the last 3 lines:

tallies <- apply(df,2,function(x){y <- table(x)[df.labels];
                                   names(y) <- df.labels; y[is.na(y)] <- 0; y})

The output is:

> tallies
            online offline
sehrwichtig     10       4
wichtig          4       4
unwichtig        0       6
Leo Alekseyev
Well, I needed to handle 0 entries. Thanks for the tip.
Felix
+1  A: 
#generate data
df<- read.table(textConnection('
      online              offline
 sehrwichtig             wichtig
     wichtig           unwichtig
 sehrwichtig           unwichtig
 sehrwichtig         sehrwichtig
 sehrwichtig         sehrwichtig
 sehrwichtig           unwichtig
 sehrwichtig           unwichtig
     wichtig             wichtig
     wichtig           unwichtig
 sehrwichtig         sehrwichtig
 sehrwichtig             wichtig
 sehrwichtig           unwichtig
     wichtig         sehrwichtig
 sehrwichtig             wichtig'
 ),header=T)

#factor levels should be the same
levels(df$online) <- levels(df$offline)

my_table <- t(aaply(df,2,table))

library(graphics)
barplot2(my_table,legend = rownames(my_table), ylim = c(0, 20))

alt text

gd047
+1  A: 

With ggplot2, you don't need to pre-aggregate the data:

library(ggplot2)
qplot(online, data = df, fill = offline)
qplot(offline, data = df, fill = online)
hadley