tags:

views:

78

answers:

1

I have attempted to email the author of this package without success, just wondering if anybody else has experienced this.

I am having an using rpart on 4000 rows of data with 13 attributes. I can run the same test on 300 rows of the same data with no issue. When I run on 4000 rows, Rgui.exe runs consistently at 50% cpu and the UI hangs.... it will stay like this for at least 4-5hours if I let it run, and never exit or become responsive.

here is the code I am using both on the 300 and 4000 size subset :

train<-read.csv("input.csv",header=T)
y<-train[,18]
x<-train[,3:17]
library(rpart)
fit<-rpart(y~.,x)

Is this a known limitation of rpart, am I doing something wrong? potential workarounds?

any assistance appreciated

+2  A: 

Can you reproduce the error message when you feed rpart random data of similar dimensions, rather than your real data (from input.csv)? If not, it's probably a problem with your data (formatting perhaps?). After importing your data using read.csv, check the data for format issues by looking at the output from str(train).

#How to do an equivalent rpart fit one some random data of equivalent dimension
dats<-data.frame(matrix(rnorm(4000*14), nrow=4000))

y<-dats[,1]
x<-dats[,-1]
library(rpart)
system.time(fit<-rpart(y~.,x))
mike