ansaurus

Question

subsetting nonconsecituve observations in R

Answer 1

+1 A:

As for the first question, I would use the quantile function, to get a subset of the dataframe according to the 1,2,3,...,100 percentile of the total number of (say) first column's observations (assuming integer values in column 1)

df[df[,1] %in% round(quantile(df[,1], probs = c(1:100)/100)),]

gd047 2010-06-04 10:03:26

@gd047: I agree that this is something that Roberto asked for, but I'm not sure it's a useful subset, since it could have very different properties to the original. There could be several rows that match some percentiles, and none matching others.

Richie Cotton 2010-06-04 10:40:32

George, I tried the following (I am interested in quantiles of row numbers):`df_small <- subset(df, row(df) %in% round(nrow(df)/100*(1:100),0))`I get: `Error: (subscript) logical subscript too long` and I can't figure out why this is. Ideas?Richie: your observation is right but I just want a summary Pareto plot with 100 datapoints, so this should be fine.

Roberto 2010-06-07 21:46:28

@Roberto I would suggest `df_small <- df[round(nrow(df)/100*(1:100),0),]`

gd047 2010-06-08 06:27:05

Answer 2

+1 A:

For a 'big' dataset

dfr <- data.frame(x = 1:1000, y = runif(1000))

You can take subsets of regularly spaced rows with

dfr[!(seq_len(nrow(dfr)) %% 50),]

Or random subsets with

dfr[sample(nrow(dfr), 20),]

As gd047 mentioned, use quantile to get quantiles/percentiles.

Richie Cotton 2010-06-04 10:49:16

ansaurus

tags:

views:

answers:

subsetting nonconsecituve observations in R

related questions