views:

344

answers:

3

I am trying to do a regression problem but I have 3 independent variables and not 1 so it is hard to detect outliers from a scatter graph. Any suggestions?

A: 

Why not just create three univariate scatter plots? Or else use a robust regression model?

A similar question was ask with respect to automated outlier detection using "R", which is also worth reviewing.

Shane
Because you can have multivariate outliers that are not univariate outliers.
el chief
A: 

The simplest thing, for starters, is to compute the leverage for all datapoints, see http://en.wikipedia.org/wiki/Partial%5Fleverage and look for especially influential observations (those that have high leverage). R provides a wealth of diagnostic plots when you plot a regression object, but you might need to read into some book on robust estimation to take full advantage of those.

Alex
+1  A: 

The proximities computed by random forests can be used to detect outliers. The basic idea is that we identify an outlier by how far away it is from all other observations belonging to its class in the learning set. Check the outlier function in the randomForest package.

gd047