ansaurus

Question

program R- in ggplot restrict y to be >0 in LOESS plot

Answer 1

+2 A:

I can't test it without some example data, but

qplot(data=sites, x, y, main="Site 349")  
(p <- qplot(data = sites, x, y, xlab = "", ylab = "")) 
(p1 <- p + geom_smooth(method = "loess",span=0.5, size = 1.5)) 
p1 + theme_bw() + opts(title = "Site 349") + ylim(0, foo)

(where foo is a suitable upper limit for your plot) might do the trick. Unlike in base graphics, the xlim() and ylim() commands in ggplot actually restrict the data that are used in making the plot, rather than just the plot window. It might also restrict the geom_smooth() (though I'm not certain).

Edit: After reading a bit more, you might also want to consider switching out the model that geom_smooth is using. Again, not being able to see your data is a problem. But, for example, if it's binary - you can add stat_smooth(method="glm", family="binomial") to get a logit-smoothed line. See ?stat_smooth for more.

Matt Parker 2010-05-05 23:07:41

Here's an example of the data I am using. y are count observations of a population. I convert the date with:x<-as.Date(sites$date, origin="1960-01-01") site date y1 1164 13549 39 1164 13815 2011 1164 13928 2413 1164 13998 2814 1164 14047 3140 1164 15211 2841 1164 15273 742 1164 15306 1343 1164 15371 344 1164 15544 045 1164 15642 346 1164 15733 047 1164 15819 548 1164 16005 049 1164 16082 250 1164 16187 751 1164 16268 352 1164 16366 153 1164 16455 254 1164 16555 256 1164 16730 057 1164 16831 058 1164 16933 159 1164 16989 160 1164 17092 1

Nate 2010-05-07 15:32:39

Oof. That's not much help. Use the function `dput()` to convert your example data frame into a text representation, then edit your answer above. See my recent question for an example: http://stackoverflow.com/questions/2789916/comparing-values-longitudinally-in-r-with-a-twist

Matt Parker 2010-05-07 16:48:56

structure(list(site = c(928L, 928L, 928L, 928L, 928L, 928L, 928L, 928L, 928L, 928L, 928L, 928L, 928L, 928L, 928L, 928L, 928L, 928L, 928L, 928L, 928L, 928L, 928L, 928L, 928L, 928L), date = c(13493L, 13534L, 13566L, 13611L, 13723L, 13752L, 13804L, 13837L, 13927L, 14028L, 14082L, 14122L, 14150L, 14182L, 14199L, 16198L, 16279L, 16607L, 16945L, 17545L, 17650L, 17743L, 17868L, 17941L, 18017L, 18092L), y = c(7L, 7L, 17L, 18L, 17L, 17L, 10L, 3L, 17L, 24L, 11L, 5L, 5L, 3L, 5L, 14L, 2L, 9L, 9L, 4L, 7L, 6L, 1L, 0L, 5L, 0L)), .Names = c("site", "date", "y")

Nate 2010-05-07 21:50:57

, class = "data.frame", row.names = c(NA, -26L))

Nate 2010-05-07 21:51:19

I convert dates with:x<-as.Date(sites$date, origin="1960-01-01")

Nate 2010-05-07 21:52:13

Family would be poisson for GLM, but the CI seems much smaller and liner straighter than LOESS. It's arbitrary, but I'd like more squiggle as in the LOESS. Any thoughts?

Nate 2010-05-07 21:54:32

None, unfortunately. The default model for `stat_smooth` is GAM, which I don't know much about.

Matt Parker 2010-05-07 22:33:46

Answer 2

+1 A:

I am seconding Matt Parker's suggestion that you have to change the fitting procedure. One option that often works for positive-only data, is to do the fit on log-scale, and then exponentiate to get results on the original scale. This will guarantee positive only values.

Generating random data that has some of this issues:

 d <- data.frame(x=0:100)
 d$y <- exp(rnorm(nrow(d), mean=-d$x/40, sd=0.8))
 qplot(x,y,data=d) + stat_smooth()

Now we can use ggplot's transformation capabilites to log-transform the y-values, but display the results on an exponential scale (which corresponds to the original one):

qplot(x,y,data=d) + stat_smooth() + scale_y_log10()+coord_trans(ytrans="pow10")

You can see examples like this on the coord_trans help page. If you don't like the y-axis, you can manipulate the breaks and labels.

Aniko 2010-05-06 15:40:07

Really nice, Aniko.

Matt Parker 2010-05-06 18:38:52

Aniko, I get this error in trying this:Error: NA/NaN/Inf in foreign function call (arg 1)

Nate 2010-05-07 21:25:34

Works fine for me.

Matt Parker 2010-05-07 22:29:51

@Nate: You get the "NA/NaN/Inf ..." error if you have zeros among the y values. Since log(0) is undefined, this is not surprising. This method works only for _positive_ values. Depending on the context (eg. if `y` gives counts), you might use `y+1` instead of `y`, but it's getting messy.

Aniko 2010-05-10 14:00:15

Ah that makes sense. I ended up going with a GAM smooth (library mgcv) which works...so I think I'll stick with it. Thank you all for your responses.

Nate 2010-05-10 21:00:46

ansaurus

tags:

views:

answers:

program R- in ggplot restrict y to be >0 in LOESS plot

related questions