ansaurus

Question

Is there an equivalence of "anova" (for lm) to an rpart object ?

Answer 1

+3 A:

Of course anova would be impossible, as anova involves calculating the total variation in the response variable and partitioning it into informative components (SSA, SSE). I can't see how one could calculate sum of squares for a categorical variable like Kyphosis.

I think that what you actually talking about is Attribute Selection (or evaluation). I would use the information gain measure for example. I think that this is what is used to select the test attribute at each node in the tree and the attribute with the highest information gain (or greatest entropy reduction) is chosen as the test attribute for the current node. This attribute minimizes the information needed to classify the samples in the resulting partitions.

I am not aware whether there is a method of ranking attributes according to their information gain in R, but I know that there is in WEKA and is named InfoGainAttributeEval It evaluates the worth of an attribute by measuring the information gain with respect to the class. And if you use Ranker as the Search Method, the attributes are ranked by their individual evaluations.

EDIT I finally found a way to do this in R using Library CORElearn

estInfGain <- attrEval(Kyphosis ~ ., kyphosis, estimator="InfGain")
print(estInfGain)

gd047 2010-03-07 19:59:17

Thanks gd047, This is a very helpful direction!I am looking forward to other ideas from people.Thanks!Tal

Tal Galili 2010-03-07 20:33:04

gd047, I just went around looking for R implementation of information gain measures -and I can't seem to find any one talking about it. Maybe I will just connect R with wekka for this.Thanks again for the lead !Tal

Tal Galili 2010-03-07 21:52:11

ansaurus

tags:

views:

answers:

Is there an equivalence of "anova" (for lm) to an rpart object ?

related questions