r

Information criterions in exp smoothing models

What are the numbers of parameters to be penalized for when using information criterions(BIC or AIC or..) for selecting the best models? Let's say that we have 3 models: 1. Simple exponential smoothing 2. Holt's method(level+trend) 3. Holt Winters(L+T+S), where we have monthly seasonality. How many parameters for penalization does have e...

Selecting by observation in R table

I was working through the Rosetta Code example of the knapsack problem in R and I came up with four solutions. What is the best way to output only one of the solutions based on the observation number give in the table? > values[values$value==max(values$value),] I II III value weight volume 2067 9 0 11 54500 25 25 211...

Parse JSON with R

I am fairly new to R, but the more use it, the more I see how powerful it really is over SAS or SPSS. Just one of the major benefits, as I see them, is the ability to get and analyze data from the web. I imagine this is possible (and maybe even straightforward), but I am looking to parse JSON data that is publicly available on the web....

Create lm object from data/coefficients

Does anyone know of a function that can create an lm object given a dataset and coefficients? I'm interested in this because I started playing with Bayesian model averaging (BMA) and I'd like to be able to create an lm object out of the results of bicreg. I'd like to have access to all of the nice generic lm functions like diagnostic p...

Do I always have to use data frames in ggplot2

Dear All, I'm running a monte-carlo simulation and the output is in the form: > d = data.frame(iter=seq(1, 2), k1 = c(0.2, 0.6), k2=c(0.3, 0.4)) > d iter k1 k2 1 0.2 0.3 2 0.6 0.4 The plots I want to generate are: plot(d$iter, d$k1) plot(density(d$k1)) I know how to do equivalent plots using ggplot2, convert to data ...

How to transform XML data into a data.frame?

I'm trying to learn R's XML package. I'm trying to create a data.frame from books.xml sample xml data file. Here's what I get: library(XML) books <- "http://www.w3schools.com/XQuery/books.xml" doc <- xmlTreeParse(books, useInternalNodes = TRUE) doc xpathApply(doc, "//book", function(x) do.call(paste, as.list(xmlValue(x)))) xpathSApply(d...

reading and plotting an esri shape file in R

I'm having difficulties reading in a .shp (esri shape file) into R. I have tried several options in R, and tried to convert the shape file in ArcMap to something that correctly reads in the shape file but nothing worked yet. (In ArcMap I corrected the geometry, converted from single to multipolygon, etc which was probably not necessary o...

2 questions: 1) long 2 wide data in R, 2) followup re: rattle

1) long to wide question: I have a dataset with 3 columns: person, event, frequency. If the frequency is zero, the row is not in the table. Is there a simple way using basic R functions or libraries to convert this table to wide format, with one row per person and one column per event with the frequency as the value in table. 2) rattl...

doing a plyr operation on every row of a data frame in R

I like the plyr syntax. Any time I have to use one of the *apply() commands I end up kicking the dog and going on a 3 day bender. So for the sake of my dog and my liver, what's concise syntax for doing a ddply operation on every row of a data frame? Here's an example that works well for a simple case: x <- rnorm(10) y <- rnorm(10) df <...

Getting more info from Rprof()

I've been trying to dig into what the time-hogs are in some R code I've written, so I'm using Rprof. The output isn't yet very helpful though: > summaryRprof() $by.self self.time self.pct total.time total.pct "$<-.data.frame" 2.38 23.2 2.38 23.2 "FUN" 2.04 19.9 ...

Most underused data visualization

Histograms and scatterplots are great methods of visualizing data and the relationship between variables, but recently I have been wondering about what visualization techniques I am missing. What do you think is the most underused type of plot? Answers should: Not be very commonly used in practice. Be understandable without a great de...

R ggplot2 question - working with factors

I've got a dataset that looks like this... mine tonnes week AA 112 41 AA 114 41 AA 119 41 BB 108 41 BB 112 41 AA 110 42 AA 109 42 AA 102 43 AA 101 43 And I want to create a boxplot in ggplot2 to show the distribution of tonnes for each week. But I only want results from mine AA. I thoug...

Plotting shapefiles on top of Google map tiles

I have some shapefiles I want to plot over Google Maps tiles. What's the most efficient way to do this? One path might be to use the pkg RgoogleMaps, however, it is still unclear to me how to do this. I assume using PlotonStaticMap with some combination of reformatting the shapefile data ...

Parallel gsub: how does one remove a different string in each element of a vector

I have a guest list that has a last name in one column and then in another column I have the first names or the full names (first space last) of each person in the family. I am wanting to get the other column to just have the first names. gsub(guest.w$Last.Name,"",guest.w$Party.Name.s.) That would work perfectly if I just had one row...

Generating means from a bivariate gaussian distribution

I am reading Elements of Statistical Learning ESLII and in chapter 2, they have a gaussian mixture data set to illustrate some learning algorithms. To generate this data set, they first generate 10 means from a bivariate gaussian distribution N((1,0)', I). I am not sure what they mean? How can you generate 10 means from a bivariate dist...

Generating a vector of the number of items in each list item

I have a list containing 98 items. But each item contains 0, 1, 2, 3, 4 or 5 character strings. I know how to get the length of the list and in fact someone has asked the question before and got voted down for presumably asking such an easy question. But I want a vector that is 98 elements long with each element being an integer from 0...

Generating interaction variables in R dataframes

Is there a way - other than a for loop - to generate new variables in an R dataframe, which will be all the possible 2-way interactions between the existing ones? i.e. supposing a dataframe with three numeric variables V1, V2, V3, I would like to generate the following new variables: Inter.V1V2 (= V1 * V2) Inter.V1V3 (= V1 * V3) Inter....

Amelia Zelig question

Hello I see this question on the Zelig list, so I know it's not data dependent, but I haven't seen an answer or good workaround.... If I run library(Amelia) susanMI.out <- amelia(susan, m = 5, noms = "married", ts = 'time', cs = 'id', intercs = T, sqrt = "unprot_vag_sex", lags = "married", ...

cURL for webpage login

I want to submit my submissions to this competition automatically from my code. I need to log-in on this page and then submit a file on this page. I'd like to use cURL since it integrates with both of the languages that I am using (R and Python). I am just wondering if this procedure is possible in cURL? and my another question is if I...

How do I get a predictions list from running svm in e1071 package

Hi Q1: I have been trying to get the AUC value for a classification problem and have been trying to use e1071 and ROCR packages in R for this. ROCR has a nice example "ROCR.simple" which has prediction values and label values. library(ROCR) data(ROCR.simple) pred<-prediction(ROCR.simpe$predictions, ROCR.simple$labels) auc<-performance...