r

Suitable functional language for scientific/statistical computing?

I use mostly R and C for statistics-related tasks. Recently I have been dealing with large datasets, typically 1e7-1e8 observations, and 100 features. They seem too big for R too handle, and the package I typically use are also more prone to crashing. I could develop tools directly in C or C++, but this would slow down the development cy...

How to do median splits within factor levels in R?

Here I make a new column to indicate whether myData is above or below its median ### MedianSplits based on Whole Data #create some test data myDataFrame=data.frame(myData=runif(15),myFactor=rep(c("A","B","C"),5)) #create column showing median split myBreaks= quantile(myDataFrame$myData,c(0,.5,1)) myDataFrame$MedianSplitWholeData = cut...

Developing Geographic Thematic Maps with R

There are clearly a number of packages in R for all sorts of spatial analysis. That can by seen in the CRAN Task View: Analysis of Spatial Data. These packages are numerous and diverse, but all I want to do is some simple thematic maps. I have data with county and state FIPS codes and I have ESRI shape files of county and state boundarie...

rows being dropped in R with read.table?

I am loading a table in which the first column is a URL and reading it into R using read.table(). It seems that R is dropping about 1/3 of the columns and does not return any errors. The URLs do not contain any # characters or tabs (my separator field), which I understand could be an issue. If I convert the URLs to integer IDs first, ...

How to export the definition of an R object to plain text so that others can recreate it?

Let's say you have this data in R, and you want to post a question on stackoverflow. For others to best help you, it would be nice if they could have a copy of your object (dataframe, vector, etc) to work with. Let's say your data is in a data frame called site.data > site.data site year peak 1 ALBEN 5 101529.6 2 ALBEN 1...

How to organize large R programs?

When I undertake an R project of any complexity, my scripts quickly get long and confusing. What are some practices I can adopt so that my code will always be a pleasure to work with? I'm thinking about things like Placement of functions in source files When to break something out to another source file What should be in the master f...

How to get row from R dataframe

I have a dataframe with column headers. How can I get a specific row from the dataframe as a list (with the column headers as keys for the list)? Specifically, my dataframe is A B C 1 5 4.25 4.5 2 3.5 4 2.5 3 3.25 4 4 4 4.25 4.5 2.25 5 1.5 4.5 3 And I want to get a row that's the equivalent of > c(a=5, b=4.25, c=4.5...

Creating (and Accessing) a Sparse Matrix in R

After learning about the options for working with sparse matrices in R, I want to use the Matrix package to create a sparse matrix from the following data frame and have all other elements be NA. s r d 1 1089 3772 1 2 1109 190 1 3 1109 2460 1 4 1109 3071 2 5 1109 3618 1 6 1109 38 7 I know I can create a sparse matrix with t...

Specify Width and Height of Plot

I have a panel containing three plots. How can I use par to specify the width and height of the main panel so it is always at a fixed size? ...

Suggestions on way/resources to start learning statistical language R?

Duplicate: Books for learning the R language Understandable documentation about R? Good intro books for R I have never used any statistical language and my field (Bioinformatics) demands that I know R, in particular, well. Any suggestions on how to start learning R? ...

What is the most useful R trick?

In order to share some more tips and tricks for R, what is your single-most useful feature or trick? Clever vectorization? Data input/output? Visualization and graphics? Statistical analysis? Special functions? The interactive environment itself? One item per post, and we will see if we get a winner by means of votes. [Edit 25-A...

How to sort a dataframe by column(s) in R

I want to sort a dataframe by multiple columns in R. For example, with the data frame below I would like to sort by column z (descending) then by column b (ascending): dd <- data.frame(b = factor(c("Hi", "Med", "Hi", "Low"), levels = c("Low", "Med", "Hi"), ordered = TRUE), x = c("A", "D", "A", "C"), y = c(8, 3, 9, 9), ...

Plotting Regression results from lme4 in R using Lattice (or something else)

I have fit a regression using lme4 thanks to a previous answer. Now that I have a regression fit for each state I'd like to use lattice to plot QQ plots for each state. I would also like to plot error plots for each state in a lattice format. How do I make a lattice plot using the results of a lme4 regression? Below is a simple sample ...

Creating a Movie from a Series of Plots in R

Is there an easy way to create a "movie" by stitching together several plots, within R? ...

How to join data frames in R (inner, outer, left, right)?

Given two data frames df1 = data.frame(CustomerId=c(1:6),Product=c(rep("Toaster",3),rep("Radio",3))) df2 = data.frame(CustomerId=c(2,4,6),State=c(rep("Alabama",2),rep("Ohio",1))) > df1 CustomerId Product 1 Toaster 2 Toaster 3 Toaster 4 Radio 5 Radio 6 Radio > df...

Formulas in user-defined functions in R

Formulas are a very useful feature of R's statistical and graphical functions. Like everyone, I am a user of these functions. However, I have never written a function that takes a formula object as an argument. I was wondering if someone could help me, by either linking to a readable introduction to this side of R programming, or by givi...

Mixed Merge in R - Subscript solution?

Note: I changed the example from when I first posted. My first example was too simplified to capture the real problem. I have two data frames which are sorted differently in one column. I want to match one column and then merge in the value from the second column. The second column needs to stay in the same order. So I have this: st...

Rolling median algorithm in C

I am currently working on an algorithm to implement a rolling median filter (analogous to a rolling mean filter) in C. From my search of the literature, there appear to be two reasonably efficient ways to do it. The first is to sort the initial window of values, then perform a binary search to insert the new value and remove the exiting ...

In R do you use attach() or call variables by name or slicing?

Many intro R books and guides start off with the practice of attaching a data.frame so that you can call the variables by name. I have always found it favorable to call variables with $ notation or square bracket slicing [,2]. That way I can use multiple data.frames without confusing them and/or use iteration to successively call columns...

Lagging Variables in R

What is the most efficient way to make a matrix of lagged variables in R for an arbitrary variable (i.e. not a regular time series) for example: input: x <- c(1,2,3,4) 2 lags output: [1,NA, NA] [2, 1, NA] [3, 2, 1] [4, 3, 2] ...