tags:

views:

462

answers:

4

I have data that looks like this:

#val  Freq1 Freq2
0.000 178 202
0.001 4611 5300
0.002 99 112
0.003 26 30
0.004 17 20
0.005 15 20
0.006 11 14
0.007 11 13
0.008 13 13
...many more lines..

Full data can be found here: http://dpaste.com/173536/plain/

What I intend to do is to have a cumulative graph with "val" as x-axis with "Freq1" & "Freq2" as y-axis, plot together in 1 graph.

I have this code. But it creates two plots instead of 1.

dat <- read.table("stat.txt",header=F);
val<-dat$V1
freq1<-dat$V2
freq2<-dat$V3

valf1<-rep(val,freq1)
valf2<-rep(val,freq2)

valfreq1table<- table(valf1)
valfreq2table<- table(valf2)
cumfreq1=c(0,cumsum(valfreq1table))
cumfreq2=c(0,cumsum(valfreq2table))

plot(cumfreq1, ylab="CumFreq",xlab="Loglik Ratio")
lines(cumfreq1)
plot(cumfreq2, ylab="CumFreq",xlab="Loglik Ratio")
lines(cumfreq2)

What's the right way to approach this?

+2  A: 

Try the ecdf() function in base R --- which uses plot.stepfun() if memory serves --- or the Ecdf() function in Hmisc by Frank Harrell. Here is an example from help(Ecdf) that uses a grouping variable to show two ecdfs in one plot:

 # Example showing how to draw multiple ECDFs from paired data
 pre.test <- rnorm(100,50,10)
 post.test <- rnorm(100,55,10)
 x <- c(pre.test, post.test)
 g <- c(rep('Pre',length(pre.test)),rep('Post',length(post.test)))
 Ecdf(x, group=g, xlab='Test Results', label.curves=list(keys=1:2))
Dirk Eddelbuettel
I tested your code, but it gave me the following message: "unused argument(s) (group = g, xlab = "Test Results", label.curves = list(keys = 1:2))"
neversaint
The code works perfectly for me. Make sure you are using Ecdf and not ecdf. You will get the error if you use the latter function instead.
Rob Hyndman
The y-axis in Ecdf is normalized (i.e. 0 to 1). Is there a way to make it use the "reverse" cumulative frequency of values > x ? (i.e. something equivalent to what="1-f")
neversaint
+1  A: 

Just for the record, here is how you get multiple lines in the same plot "by hand":

plot(cumfreq1, ylab="CumFreq",xlab="Loglik Ratio", type="l") 
          # or type="b" for lines and points
lines(cumfreq2, col="red") 
Aniko
+3  A: 
data <- read.table("http://dpaste.com/173536/plain/", header = FALSE)

sample1 <- unlist(apply(as.matrix(data),1,function(x) rep(x[1],x[2])))
sample2 <- unlist(apply(as.matrix(data),1,function(x) rep(x[1],x[3])))

plot(ecdf(sample1), verticals=TRUE, do.p=FALSE,
main="ECDF plot for both samples", xlab="Scores", 
ylab="Cumulative Percent",lty="dashed")

lines(ecdf(sample2), verticals=TRUE, do.p=FALSE,
col.h="red", col.v="red",lty="dotted")

legend(100,.8,c("Sample 1","Sample 2"),
col=c("black","red"),lty=c("dashed","dotted"))
gd047
A: 

Is there a way to use ggplot2 to make the output look better?

datayoda