r

How can I plot a histogram of a long-tailed data using R?

I have data that is mostly centered in a small range (1-10) but there is a significant number of points (say, 10%) which are in (10-1000). I would like to plot a histogram for this data that will focus on (1-10) but will also show the (10-1000) data. Something like a log-scale for th histogram. Yes, i know this means not all bins are of...

Running R jobs on a grid computing environment

I am running some large regression models in R in a grid computing environment. As far as I know, the grid just gives me more memory and faster processors, so I think this question would also apply for those who are using R on a powerful computer. The regression models I am running have lots of observations, and several factor variable...

How can I add a subtitle and change the font size of ggplot plots in R?

I tried adding a subtitle using +opts(subtitle="text") but nothing showed up. The main title does work (+opts(title="text")). I would also like to use a larger font for the axis (labels and coordinates), but I can't tell how to do that. ...

with sequential calls to $R CMD or $R --vanilla, do I have to reload libraries in each R script?

I would like to run a sequence of R scripts from the bash command line. Can I keep the R session 'open' between calls? Or do I have to save and load objects and re-load the libraries in each script? Thanks in advance ...

How to re arrange data in a dataframe using R(combine similiar repeating columns)

I have a file where a data structure containing 6 columns is stored side by side. That means I have n times 6 columns stored in a flat file. Basically, I want to rearrange the data in a form that I only have a data.frame containing 6 columns but appending all the data from the file to the end of the first 6 columns. Row 1V1 1V2 1V3 1V4 1...

R - optimize objective function (does lots of matrix manipulation)

This is (the greatest part of) the cost function for a genetic optimization, so it needs to be really fast. Right now it's painfully slow even for toy problem sizes. I'm pretty sure there are faster ways to do many of the operations in this code, but I'm not that good at tuning R code. Can anyone suggest anything? Fb, Ft, and Fi are ...

Calculating Months between Factored Time Variables

I have a factored time series that looks like this: df <- data.frame(a=c("11-JUL-2004", "11-JUL-2005", "11-JUL-2006", "11-JUL-2007", "11-JUL-2008"), b=c("11-JUN-1999", "11-JUN-2000", "11-JUN-2001", "11-JUN-2002", "11-JUN-2003")) First, I would like to convert this to a format...

How to reduce the array dimensions in R ?

Hi, all, Suppose I have a data array, dat <- array(NA, c(115,45,10)) How can I get a new data array dat1<- array(NA, c(115,45)) by averaging dat by the third dimension? Thanks ...

Using ESS on my desktop to run R in the cloud

I'm interested in experimenting with writing R code on my laptop which I then execute on an Amazon S3 machine. I want the execution to be interactive because I'm building a work flow and the data is only on my EC2 instance, not on my local machine. I could redirect X11 to put the remove ESS window on my local machine, but I've had some...

How can I specify the working directory while running R CMD check?

The command 'R CMD check' runs the R files in the project's tests directory. The directory structure: toplevel project R rmongo.R tests RMongo-Ex.R When i R CMD check project in toplevel directory, i run into this error: cannot open file '../R/rmongo.R': No such file or directory because my test file sources th...

RExcel Runtime Error

Has anyone ever received a runtime error '13' type mismatch when trying to run R from within Excel using RExcel? The install previously worked. Any help very much appreciated. ...

Controlling relative size of points in ggplot2 plots

I need to draw many different tile plots, which have squares and dots on top of the tiles according to data. Unfortunately I cannot include illustrating picture, but basically the plot consist of tiles which either have squares and dots on them or not. Each of those figures has different number of tiles on x-direction and y-direction....

Using r and weka. How can I use meta-algorithms along with nfold evaluation method?

Here is an example of my problem library(RWeka) iris <- read.arff("iris.arff") Perform nfolds to obtain the proper accuracy of the classifier. m<-J48(class~., data=iris) e<-evaluate_Weka_classifier(m,numFolds = 5) summary(e) The results provided here are obtained by building the model with a part of the dataset and testing it with ...

Idiomatic R method for "left joining" two data frames

I have two data frames that both have a column containing a factor like the following: > head(test.data) var0 var1 date store 1 109.5678 109.5678 1990-03-30 Store1 2 109.3009 108.4261 1990-06-30 Store1 3 108.8262 106.2517 1990-09-30 Store1 4 108.2443 108.6417 1990-12-30 Store1 5 109.5678 109.5678 1991-03-30 Store1 6 109...

How to force R to use a specified factor level as reference in a regression?

Somehow I can´t find it in my notes... nor do find the obivous on the net. How can I tell R to use a certain level as reference if I use dummy explanatories in a regression? It´s just using some level by default. lm(x ~ y + as.factor(b)) with b {0,1,2,3,4} . Let´s say I want to use 3 instead of the zero that is used by R. Thx in a...

Can you use fix via do.call?

I have some code where it is more convenient to call fix via do.call, rather than directly. Any old data frame will work for this example: dfr <- data.frame(x = 1:5, y = letters[1:5]) The obvious first attempt is do.call("fix", list(dfr)) Unfortunately, this fails with Error in fix(list(x = 1:5, y = 1:5)) : 'fix' requires a nam...

How to keep certain values in an array in R?

Suppose I have a data array, dat <- array(NA, c(115,45,248)) Q1: What I do if I want to get a new data array, datnew <- array(NA, c(115,45,248)) in which, all the positive value remain and the negative value changed to NA? Q2: What I do if I want to get a new data array, datnew <- array(NA,c(115,45,31)) by averaging with the ...

Grouping/recoding factors in the same data.frame

Let's say I have a data frame like this: df <- data.frame(a=letters[1:26],1:26) And I would like to "re" factor a, b, and c as "a". How do I do that? ...

Calculating an area under a continuous density plot

I have two density curves plotted using this: Network <- Mydf$Networks quartiles <- quantile(Mydf$Avg.Position, probs=c(25,50,75)/100) density <- ggplot(Mydf, aes(x = Avg.Position, fill = Network)) d <- density + geom_density(alpha = 0.2) + xlim(1,11) + opts(title = "September 2010") + geom_vline(xintercept = quartiles, colour = "red"...

How can I use xpath querying using R's XML library?

The xml file has this snippet: <?xml version="1.0"?> <PC-AssayContainer xmlns="http://www.ncbi.nlm.nih.gov" xmlns:xs="http://www.w3.org/2001/XMLSchema-instance" xs:schemaLocation="http://www.ncbi.nlm.nih.gov ftp://ftp.ncbi.nlm.nih.gov/pubchem/specifications/pubchem.xsd" > .... <PC-AnnotatedXRef> <PC-AnnotatedXRef_x...