questions about r

Run time - using apply functions

I have two apply functions excecuting the average and standard deviation across the first two dimensions on a large three dimentional array (437216,8,3). It takes 16 minutes to complete on Rx32. It's the first of many large arrays in a database we are applying this script on a regular basis. Any thoughts on how to speed up runtime? ...

r

apply

Can Ruby interface with r?

A friend needs to do some R programming for her PhD and since I'm a programmer, asked me to give her a hand. So I took a look at some r related webstuff and discovered that you can interact with it through RPy (python) and statistics::R (perl). Is there a way for Rubyists to hook into R? Is there a dipsh*t's guide to learning R (lik...

ruby

r

R: Adding zeroes after old zeroes in a vector ??

Hello Imagine I have a vector with ones and zeroes I write it compactly: 1111111100001111111111110000000001111111111100101 I need to get a new vector replacing the "N" ones following the zeroes to new zeroes. For example for N = 3. 1111111100001111111111110000000001111111111100101 becomes 111111110000000111111111000000000000111111...

r

vector

zero

How should I do rapid GUI development for R and Octave methods (possibly with Python)?

We are a medium-sized academic research lab whose main outputs are new statistical methods for analyzing large datasets. We generally develop in R and MATLAB/Octave. We would like to expand the reach of our work by building simple, wizard-style user interfaces to access our methods, either web-apps like RNAfold or stand-alone applicati...

How to return 5 topmost values from vector in R?

I have a vector and I'm able to return highest and lowest value, but how to return 5 topmost values? Is there a simple one-line solution for this? ...

r

vector

topmost

Running R inside a buffer in Vim

I have used Stata and gVim on Windows for a while now. Recently I have switched to Linux, and I am planning to also change from Stata to R. A friend of mine is using R and Emacs ESS which seems to work perfect, however i'd rather like to keep using vim. I have installed the vim-r-plugin2, however, i can only send code to a seperate term...

In R, how can I take a subset of columns of a data frame and then eliminate duplicate rows?

Imagine I have a data frame with data like this: A | B | C ---+---+--- 1 | 2 | a 1 | 2 | b 5 | 5 | a 5 | 5 | b I want to take only columns A and B, and I want to remove any rows that have become duplicates as a result of eliminating all other columns (that is, column C). So my desied result for the table above would be: A | B -...

Workaround for pointers in R?

I have been implementing binary tree search algorithm recently in R, and before that I used linked array-like structures. These algorithm would be much easier if there were pointers in R (not C pointers, but references to objects). I wonder if there is a workaround. I don't know S4 at all; maybe it is possible in that framework? I would ...

pointers

r

[r] ggplot how to control when geom_line connects points when data points missing

Hello - For ggplot line charts I would like to control when geom_line connects between two points, and when not (due to missing observations). In other words, I would like to be able to tell geom_line to still connect when e.g. one data point is missing, but not when more than one data point is missing. Or to tell it not to connect when ...

r

ggplot2

How to draw only a range of values in geom_point from the ggplot2 package?

Hello All, I have the following molten data: X variable value 1 StationA SAR11.cluster 0.001309292 2 StationB SAR11.cluster 0.002712237 3 StationC SAR11.cluster 0.002362708 4 StationD SAR11.cluster 0.002516751 5 StationE SAR11.cluster 0.004301075 6 StationF SAR11.cluster 0.0 . . . etc. etc. I used the following code t...

r

ggplot2

scatter-plot

Nested while loop behavior in R

Hello, I am puzzled by why the output is not what I expect it to be in the following nested while loops: i = 1 j = 1 while(i<5){ print("i") print(i) i = i + 1 while(j<5){ print("j") print(j) j = j + 1 } } The output I get is: [1] "i" [1] 1 [1] "j" [1] 1 [1] "j" [1] 2 [1] "j" [1] 3 [1] "j" [1] 4 [1] "i" [1] 2 [1] "i" [1] 3 ...

r

while-loops

ggplot: showing % instead of counts in charts of categorical variables

I'm plotting a categorical variable and instead of showing the counts for each category value, I'm looking for a way to get ggplot to display the percentage of values in that category. Of course, it is possible to create another variable with the calculated percentage and plot that one, but I have to do it several dozens of times and I ...

r

ggplot2

How to find common elements from multiple vectors?

Can anyone tell me how to find the common elements from multiple vectors? a <- c(1,3,5,7,9) b <- c(3,6,8,9,10) c <- c(2,3,4,5,7,9) I want to get the common elements from the above vectors (ex: 3 and 9) ...

r

vectors

How to cbind or rbind different lengths vectors without repeating the elements of the shorter vectors?

cbind(1:2, 1:10) [,1] [,2] [1,] 1 1 [2,] 2 2 [3,] 1 3 [4,] 2 4 [5,] 1 5 [6,] 2 6 [7,] 1 7 [8,] 2 8 [9,] 1 9 [10,] 2 10 I want an output like below [,1] [,2] [1,] 1 1 [2,] 2 2 [3,] 3 [4,] 4 [5,] 5 [6,] 6 [7...

r

vectors

Stepwise Regression using P-Value

Hello Everybody ! I want to use R to perform a stepwise linear Regression using p-values as a selection criterion e.g. at each step dropping variables that have the highest i.e. the most insignificant p-values, stopping when all values are significant defined by some treshold alpha. I am totally aware that I should use the AIC (e.g. co...

r

statistics

How do I write a generic function to pick out distance between positive values?

I have a dataset that looks like so: x y 1 0.0000 0.4459183993 2 125.1128 0.4068805502 3 250.2257 0.3678521348 4 375.3385 0.3294434397 5 500.4513 0.2922601919 6 625.5642 0.2566381551 7 750.6770 0.2229130927 8 875.7898 0.1914207684 9 1000.9026 0.1624969456 10 1126.01...

r

apply strsplit rowwise

Dear all, Im trying to split a string on "." and create additional columns with the two strings before and after ".". tes<-c("1.abc","2.di","3.lik") dat<-c(5,3,2) h<-data.frame(tes,dat) h$num<-substr(h$tes,1,1) h$prim<-unlist(strsplit(as.character(h$tes),"\\."))[2] h$prim<-sapply(h$tes,unlist(strsplit(as.character(h$tes),"\\."))[2]) ...

r

strsplit

Is R "that bad" that it should be rewritten from scratch?

In the past week I've been following a discussion where Ross Ihaka wrote: I’ve been worried for some time that R isn’t going to provide the base that we’re going to need for statistical computation in the future. (It may well be that the future is already upon us.) There are certainly efficiency problems (speed and memory...

architecture

r

rewrite

R: Function using if-statement only returning one else condition regardless of value

Hey, I have the following R function that checks a range of values and writes out a sentence about them. The problem is when I use the function and then call it all the sentences are the same regardless of Value. hover <- function (){ if(df$Value > 140 && df$Change < -0.05){ hovertext ="Price High, Level of Activity Low" }else{ if(...

r

Compute/plot statistics on a 2d grid

Suppose I have an R data frame with columns that specify location (lat/long), height, and gender of individuals: x <- data.frame( lat=c(39.5,39.51,38,38.1,38.2), long=c(86,86,87,87,87), gender=c("M","F","F","M","F"), height=c(72,60,61,70,80) ) I want to bin the data in two dimensions (e.g. into 1000m x 1000m squares) and compu...

r

ggplot2

geospatial