questions about r

Can I gracefully include formatted SQL strings in an R script?

I'm working in an R script that uses a long SQL string, and I would like to keep the query relatively free of other markup so as to allow copying and pasting between editors and applications. I'd also like the ability to split the query across lines for better readability. In the RODBC documentation, the paste function is used to build...

sql

r

rodbc

Appending rows to a dataframe - the factor problem

I have a large dataframe (14552 rows by 15 columns) containing billing data from 2001 to 2007. I have used sqlFetch to get 2008 data. In order to append the 2008 data to the data of the preceding 7 years one would do as follows alltime <-rbind(alltime,all2008) Unfortunately that generates Warning message: In [<-.factor(*tmp*, ri,...

r

Saving a data frame as a binary file

I would like to save a whole bunch of relatively large data frames while minimizing the space that the files take up. When opening the files, I need to be able to control what names they are given in the workspace. Basically I'm looking for the symantics of dput and dget but with binary files. Example: n<-10000 for(i in 1:100){ ...

r

Lattice problems: lattice objects coming from JAGS, but device can't be set

Hi All, I ran JAGS with runjags in R and I got a giant list back (named results for this example). Whenever I access results$density, two lattice plots (one for each parameter) pop up in the default quartz device. I need to combine these with par(mfrow=c(2, 1)) or with a similar approach, and send them to the pdf device. Nothing I tried...

graphics

r

lattice

Consensus tree or "bootstrap proportions" from multiple hclust objects

I have a list of hclust objects resulting from slight variations in one variable (for calculating the distance matrix) now I would like to make a consensus tree from this list. Is there a generic package to do this? I am hacking my way through some code from maanova and it seems to work - but it's ugly and it needs a lot of hacking s...

r

dendrogram

hclust

Your experiences with Matlab/F#/R for data analysis and modeling algorithms

I've been using F# for a while now to model algorithms before coding them in C++, and also using it afterwards to check the results of the C++ code, and also against real-world recorded data. For the modeling side of things, it's very handy, but for the 'data mashup' kind of stuff, pulling in data from CSV and other sources, generating ...

Add a vertical line with different intercept for each panel in ggplot2

I'm using ggplot2 to create panels of histograms, and I'd like to be able to add a vertical line at the mean of each group. But geom_vline() uses the same intercept for each panel (i.e. the global mean): require("ggplot2") # setup some sample data N <- 1000 cat1 <- sample(c("a","b","c"), N, replace=T) cat2 <- sample(c("x","y","z"), N, ...

Matching strings across columns in R

I've got a data frame with 2 character columns. I'd like to find the rows which one column contains the other, however grepl is being strange. Any ideas? > ( df <- data.frame(letter=c('a','b'),food = c('apple','pear','bun','beets')) ) letter food 1 a apple 2 b pear 3 a bun 4 b beets > grepl(df$letter,df$food)...

r

Strange Problem with RPy2

Hello, After installing RPy2 from http://rpy.sourceforge.net/rpy2.html I'm trying to use it in Python 2.6 IDLE but I'm getting this error: >>> import rpy2.robjects as robjects >>> robjects.r['pi'] <RVector - Python:0x0121D8F0 / R:0x022A1760> What I'm doing wrong? ...

python

r

idle-ide

rbind dataframes in a list of lists

I have a list of lists that looks like this: x[[state]][[year]]. Each element of this is a data frame, and accessing them individually is not a problem. However, I'd like to rbind data frames across multiple lists. More specifically, I'd like to have as output as many dataframes as I have years, that is rbind all the state data frames ...

list

r

data.frame

how do you find the median of 2 columns using R ?

Hi , Im trying to compute the median vector of a data set s with column A1 and B1 , The median vector is the median for each observation from both columns. I tried to do this and it didnt work . median(s[c("A1","B1")]) Is there another way to do it ? ...

r

median

r analogous to sql inner join selection

Suppose we have the contents of tables x and y in two dataframes in R. Which is the suggested way to perform an operation like the following in sql: Select x.X1, x.X2, y.X3 into z from x inner join y on x.X1 = y.X1 I tried the following in R. Is there a better way? Thank you x<-data.frame(cbind('X1'=c(5,9,7,6,4,8,3,1,10,2),'X2'=c(5,...

sql

r

How do you make a new dataset given a set of vectors?

Is there a way in R to build a new dataset consisting of a given set of vectors -- median1, median2, median3, median4 -- which are median vectors from a previous dataset s? median1 = apply(s[,c("A1","B1","C1","D1","E1","F1","G1","H1","I1")],1,median) median2 = apply(s[,c("A2","B2","C2","D2","E2","F2","G2","H2","I2")],1,median) median3 ...

R lag over missing data

Is there a variant of lag somewhere that keeps NAs in position? I want to compute returns of price data where data could be missing. Col 1 is the price data Col 2 is the lag of price Col 3 shows p - lag(p) - the return from 99 to 104 is effectively missed, so the path length of the computed returns will differ from the true. Col 4 shows...

Why doesn't R add the title at the top of the page?

I'm trying to add a title at the top of the page scatterplots, however whenever I use the command title it doesn't add the title at the top of page and overwrites my plots. Is there a way to fix this ? plot(median, pch = ".") title(main = "Scatterplot of the median vectors ",line = 0,font=2) ...

r

title

Draw hyperplane in R ?

How does one go about drawing an hyperplane (given the equation) in 3D in R ? (i.e. 3d equivalent to "abline") Thanks in advance, ...

r

plot

How to group columns by sum in R

Let's say I have two columns of data. The first contains categories such as "First", "Second", "Third", etc. The second has numbers which represent the number of times I saw "First". For example: Category Frequency First 10 First 15 First 5 Second 2 Third 14 Third 20 Second 3 I want ...

r

sorting

matplotlib for R user?

Hi, I regularly make figures (the exploratory data analysis type) in R. I also program in Python and was wondering if there are features or concepts in matplotlib that would be worth learning. For instance, I am quite happy with R - but its image() function will produce large files with pixelated output, whereas Matlab's equivalent figur...

Loop over string variables in R

When programming in Stata I often find myself using the loop index in the programming. For example, I'll loop over a list of the variables nominalprice and realprice: local list = "nominalprice realprice" foreach i of local list { summarize `i' twoway (scatter `i' time) graph export "C:\TimePlot-`i'.png" } This will plot the t...

r

statistics

R: Summing by Categorical Variable?

I have a data set of comic book unit sales by volume (ex. Naruto v10) that I need to reduce to sales by series (so all Naruto volume unit sales would be added together into a single observation). I have a variable "series" that identifies the series of each observation. The equivalent code in Stata would be: by series, sort:replace u...

r

statistics