views:

1577

answers:

2

I have some data in CSV like:

"Timestamp", "Count"
"2009-07-20 16:30:45", 10
"2009-07-20 16:30:45", 15
"2009-07-20 16:30:46", 8
"2009-07-20 16:30:46", 6
"2009-07-20 16:30:46", 8
"2009-07-20 16:30:47", 20

I can read it into R using read.cvs. I'd like to plot:

  1. Number of entries per second, so:
    "2009-07-20 16:30:45", 2
    "2009-07-20 16:30:46", 3
    "2009-07-20 16:30:47", 1
    
  2. Average value per second:
    "2009-07-20 16:30:45", 12.5
    "2009-07-20 16:30:46", 7.333
    "2009-07-20 16:30:47", 20
    
  3. Same as 1 & 2 but then by Minute and then by Hour.

Is there some way to do this (collect by second/min/etc & plot) in R?

+3  A: 

Read your data, and convert it into a zoo object:

R> X <- read.csv("/tmp/so.csv")
R> X <- zoo(X$Count, order.by=as.POSIXct(as.character(X[,1])))

Note that this will show warnings because of non-unique timestamps.

Task 1 using aggregate with length to count:

R> aggregate(X, force, length)
2009-07-20 16:30:45 2009-07-20 16:30:46 2009-07-20 16:30:47 
                  2                   3                   1

Task 2 using aggregate:

R> aggregate(X, force, mean)
2009-07-20 16:30:45 2009-07-20 16:30:46 2009-07-20 16:30:47 
             12.500               7.333              20.000

Task 3 can be done the same way by aggregating up to higher-order indices. You can simply call plot on the result from aggregate:

plot(aggregate(X, force, mean))
Dirk Eddelbuettel
Nice! I had to add the Zoo package in the Package Manager and call "library(zoo)" first.
ayman
Yes, that's how it works with packages. For your hourly etc indices, use the 'parsed time' (here: index(X) after zoo object has been created) and created an 'aggregation index vector' that you use in aggregate: R> aggind <- format(index(X), "%Y%m%d %H%m") R> aggregate(X, aggind, mean) 20090720 1607 11.17 Exactly the same works for other indices (change the formatting rule) and functions (replace mean by whatever you need).
Dirk Eddelbuettel
Again, nice! I noticed, when I make an aggregation index vector like: R> aggind <- format(index(hx), "%Y-%m-%d %H%M") I can only plot via boxplot(aggregate(X, aggind, mean)) and not via plot(aggregate(X, aggind, mean)). plot(...) returns "Error in plot.window(...) : need finite 'xlim' values".
ayman
correction: thats barplot(...) not box.
ayman
Assign the temp. result to a variable and run summary() over it -- you may have NAs which throws some of the plotting off.
Dirk Eddelbuettel
+1  A: 

Averaging the data is easy with the plyr package.

library(plyr)
Second <- ddply(dataset, "Timestamp", function(x){
    c(Average = mean(x$Count), N = nrow(x))
})

To do the same thing by minute or hour, then you need to add fields with that info.

library(chron)
dataset$Minute <- minutes(dataset$Timestamp)
dataset$Hour <- hours(dataset$Timestamp)
dataset$Day <- dates(dataset$Timestamp)
#aggregate by hour
Hour <- ddply(dataset, c("Day", "Hour"), function(x){
    c(Average = mean(x$Count), N = nrow(x))
})
#aggregate by minute
Minute <- ddply(dataset, c("Day", "Hour", "Minute"), function(x){
    c(Average = mean(x$Count), N = nrow(x))
})
Thierry