I use mostly R and C for statistics-related tasks. Recently
I have been dealing with large datasets, typically 1e7-1e8
observations, and 100 features. They seem too big for R too
handle, and the package I typically use are also more prone
to crashing. I could develop tools directly in C or C++, but
this would slow down the development cy...
Here I make a new column to indicate whether myData is above or below its median
### MedianSplits based on Whole Data
#create some test data
myDataFrame=data.frame(myData=runif(15),myFactor=rep(c("A","B","C"),5))
#create column showing median split
myBreaks= quantile(myDataFrame$myData,c(0,.5,1))
myDataFrame$MedianSplitWholeData = cut...
There are clearly a number of packages in R for all sorts of spatial analysis. That can by seen in the CRAN Task View: Analysis of Spatial Data. These packages are numerous and diverse, but all I want to do is some simple thematic maps. I have data with county and state FIPS codes and I have ESRI shape files of county and state boundarie...
I am loading a table in which the first column is a URL and reading it into R using read.table(). It seems that R is dropping about 1/3 of the columns and does not return any errors. The URLs do not contain any # characters or tabs (my separator field), which I understand could be an issue. If I convert the URLs to integer IDs first, ...
Let's say you have this data in R, and you want to post a question on stackoverflow. For others to best help you, it would be nice if they could have a copy of your object (dataframe, vector, etc) to work with.
Let's say your data is in a data frame called site.data
> site.data
site year peak
1 ALBEN 5 101529.6
2 ALBEN 1...
When I undertake an R project of any complexity, my scripts quickly get long and confusing.
What are some practices I can adopt so that my code will always be a pleasure to work with? I'm thinking about things like
Placement of functions in source files
When to break something out to another source file
What should be in the master f...
I have a dataframe with column headers.
How can I get a specific row from the dataframe as a list (with the column headers as keys for the list)?
Specifically, my dataframe is
A B C
1 5 4.25 4.5
2 3.5 4 2.5
3 3.25 4 4
4 4.25 4.5 2.25
5 1.5 4.5 3
And I want to get a row that's the equivalent of
> c(a=5, b=4.25, c=4.5...
After learning about the options for working with sparse matrices in R, I want to use the Matrix package to create a sparse matrix from the following data frame and have all other elements be NA.
s r d
1 1089 3772 1
2 1109 190 1
3 1109 2460 1
4 1109 3071 2
5 1109 3618 1
6 1109 38 7
I know I can create a sparse matrix with t...
I have a panel containing three plots. How can I use par to specify the width and height of the main panel so it is always at a fixed size?
...
Duplicate:
Books for learning the R language
Understandable documentation about R?
Good intro books for R
I have never used any statistical language and my field (Bioinformatics) demands that I know R, in particular, well. Any suggestions on how to start learning R?
...
In order to share some more tips and tricks for R, what is your single-most useful feature or trick? Clever vectorization? Data input/output? Visualization and graphics? Statistical analysis? Special functions? The interactive environment itself?
One item per post, and we will see if we get a winner by means of votes.
[Edit 25-A...
I want to sort a dataframe by multiple columns in R. For example, with the data frame below I would like to sort by column z (descending) then by column b (ascending):
dd <- data.frame(b = factor(c("Hi", "Med", "Hi", "Low"),
levels = c("Low", "Med", "Hi"), ordered = TRUE),
x = c("A", "D", "A", "C"), y = c(8, 3, 9, 9),
...
I have fit a regression using lme4 thanks to a previous answer. Now that I have a regression fit for each state I'd like to use lattice to plot QQ plots for each state. I would also like to plot error plots for each state in a lattice format. How do I make a lattice plot using the results of a lme4 regression?
Below is a simple sample ...
Is there an easy way to create a "movie" by stitching together several plots, within R?
...
Given two data frames
df1 = data.frame(CustomerId=c(1:6),Product=c(rep("Toaster",3),rep("Radio",3)))
df2 = data.frame(CustomerId=c(2,4,6),State=c(rep("Alabama",2),rep("Ohio",1)))
> df1
CustomerId Product
1 Toaster
2 Toaster
3 Toaster
4 Radio
5 Radio
6 Radio
> df...
Formulas are a very useful feature of R's statistical and graphical functions. Like everyone, I am a user of these functions. However, I have never written a function that takes a formula object as an argument. I was wondering if someone could help me, by either linking to a readable introduction to this side of R programming, or by givi...
Note: I changed the example from when I first posted. My first example was too simplified to capture the real problem.
I have two data frames which are sorted differently in one column. I want to match one column and then merge in the value from the second column. The second column needs to stay in the same order.
So I have this:
st...
I am currently working on an algorithm to implement a rolling median filter (analogous to a rolling mean filter) in C. From my search of the literature, there appear to be two reasonably efficient ways to do it. The first is to sort the initial window of values, then perform a binary search to insert the new value and remove the exiting ...
Many intro R books and guides start off with the practice of attaching a data.frame so that you can call the variables by name. I have always found it favorable to call variables with $ notation or square bracket slicing [,2]. That way I can use multiple data.frames without confusing them and/or use iteration to successively call columns...
What is the most efficient way to make a matrix of lagged variables in R for an arbitrary variable (i.e. not a regular time series)
for example:
input:
x <- c(1,2,3,4)
2 lags
output:
[1,NA, NA]
[2, 1, NA]
[3, 2, 1]
[4, 3, 2]
...