r

is there a way to get a "subtree" from hclust ? (R)

Hello all, I wish to create a "subtree" from an hclust object. For example, let's say I have the following object: a <- list() # initialize empty object a$merge <- matrix(c(-1, -2, -3, -4, 1, 2, -5,-6, 3,4), nc=2, byrow=TRUE ) a$height <- c(1, 1.5, 3,4,4.5) # def...

How to create a "Clustergram" plot ? (in R)

Hi all, I came across this interesting website, with an idea of a way to visualize a clustering algorithm called "Clustergram": I am not sure how useful this really is, but in order to play with it I would like to reproduce it with R, but am not sure how to go about doing it. How would you create a line for each item so it would sta...

sapply and concurrency in R

Good afternoon, Somebody asked me a question today and neither did I know the answer nor could I find it in the documentation. This person simply asked me if the sapply function in R was making concurrent calls to the function you want to apply to the list, or if the computation is done sequantially. Does anybody know the answer? Wha...

how to use ggplot conditional on data

I asked this question and it seams ggplot2 currently has a bug with empty data.frames. Therefore I am trying to check if the dataframe is empty, before I make the plot. But what ever I come up with, it gets really ugly, and doesn't work. So I am asking for your help. example data: SOdata <- structure(list(id = 10:55, one = c(7L, 8...

R: optimal way of computing the "product" of two vectors

Hi, Let's assume that I have a vector r <- rnorm(4) and a matrix W of dimension 20000*200 for example: W <- matrix(rnorm(20000*200),20000,200) I want to compute a new matrix M of dimension 5000*200 such that m11 <- r%*%W[1:4,1], m21 <- r%*%W[5:8,1], m12 <- r%*%W[1:4,2] etc. (i.e. grouping rows 4-by-4 and computing the product). W...

Changing text size on a ggplot bump plot

Hi, I'm fairly new to ggplot. I have made a bumpplot using code posted below. I got the code from someones blog - i've lost the link.... I want to be able to increase the size of the lables (here letters which care very small on the left and right of the plot) without affecting the width of the lines (this will only really make sense ...

ggplot2 footnote

What is the best way to add a footnote to the bottom of a plot created with ggplot2? I've tried using a combination of the logic noted here http://www.r-bloggers.com/r-good-practice-%E2%80%93-adding-footnotes-to-graphics/ as well as the ggplot2 annotate function p + annotate("text",label="Footnote", x=unit(1,"npc") - unit(2, "mm"),y=...

k-means clustering in R on very large, sparse matrix?

Hello, I am trying to do some k-means clustering on a very large matrix. The matrix is approximately 500000 rows x 4000 cols yet very sparse (only a couple of "1" values per row). The whole thing does not fit into memory, so I converted it into a sparse ARFF file. But R obviously can't read the sparse ARFF file format. I also have th...

Merge two data frames together that have the same variable names and data types

I have tried the merge function to merge two csv files that I imported. They both have the same variable names and data types but each time I run merge all that I get is an object that contains the names of the two data frames. I have tried the following: # ex1 obj <- merge(obj1, obj2, by=obj) # ex2 obj <- merge(obj1, obj2, all) and s...

Screening (multi)collinearity in a regression model

I hope that this one is not going to be "ask-and-answer" question... here goes: (multi)collinearity refers to extremely high correlations between predictors in the regression model. How to cure them... well, sometimes you don't need to "cure" collinearity, since it doesn't affect regression model itself, but interpretation of an effect o...

Is the FoldLeft function available in R?

Hi, I would like to know if there is an implementation of the foldLeft function (and foldRight?) in R. The language is supposed to be "rather" functional oriented and hence I think there should be something like this, but I could not find it in the documentation. To me, foldLeft function applies on a list and has the following signatu...

R: Print list to a text file

I have in R a list like this: > print(head(mylist,2)) [[1]] [1] 234984 10354 41175 932711 426928 [[2]] [1] 1693237 13462 Each element of the list has different number of its elements. I would like to print this list to a text file like this: mylist.txt 234984 10354 41175 932711 426928 1693237 13462 I know that I can use s...

subset a data.frame with multiple conditions

Suppose my data looks like this: 2372 Kansas KS2000111 HUMBOLDT, CITY OF ATRAZINE 1.3 05/07/2006 9104 Kansas KS2000111 HUMBOLDT, CITY OF ATRAZINE 0.34 07/23/2006 9212 Kansas KS2000111 HUMBOLDT, CITY OF ATRAZINE 0.33 02/11/2007 2094 Kansas KS2000111 HUMBOLDT, CITY OF ATRAZINE 1.4 05/06/2007 16763 Kansas KS200011...

How to skip extra lines before the header of a tab delimited delimited file in R

The software I am using produces log files with a variable number of lines of summary information followed by lots of tab delimited data. I am trying to write a function that will read the data from these log files into a data frame ignoring the summary information. The summary information never contains a tab, so the following function ...

Using R to download zipped data file, extract, and import data

@EZGraphs on Twitter writes: "Lots of online csvs are zipped. Is there a way to download, unzip the archive, and load the data to a data.frame using R? #Rstats" I was also trying to do this today, but ended up just downloading the zip file manually. I tried something like: fileName <- "http://www.newcl.org/data/zipfiles/a1.zip" con1 <...

Using R to open grib files

I am using R to work with meteorological data. I proceed in two steps: 1- convert grib to netcdf using the commande line function ncl_convert2nc from ncar command language 2- use package ncdf in R to import the netcdf data. I still have one problem: 2- For some particular grib files, the convertion with ncar tool does not work. Is ...

R strsplit and vectorization

When creating functions that use strsplit, vector inputs do not behave as desired, and sapply needs to be used. This is due to the list output that strsplit produces. Is there a way to vectorize the process - that is, the function produces the correct element in the list for each of the elements of the input? For example, to count the l...

Interpolation of time series data in R

I'm not sure what i'm missing here, but i'm basically trying to compute interpolated values for a time series; when I directly plot the series, constraining the interpolation points with "interpolation.date.vector", the plot is correct: plot(date.vector,fact.vector,ylab='Quantity') lines(spline(date.vector,fact.vector,xout=interpolation...

How to convert searchTwitter results (from library(twitteR)) into a data.frame?

I am working on saving twitter search results into a database (SQL Server) and am getting an error when I pull the search results from twitteR. If I execute: library(twitteR) puppy <- as.data.frame(searchTwitter("puppy", session=getCurlHandle(),num=100)) I get an error of: Error in as.data.frame.default(x[[i]], optional = TRUE) : ...

What does BLAS DGEMV error code -6 mean?

I have a program that runs through R but uses the BLAS routines. It runs through correctly about 8 times but then throws an error: BLAS/LAPACK routine 'DGEMV ' gave error code -6 What does this error code mean? ...