Hi,
I'm comparing some classifiers. My procedure is to compute the confusion matrix with the table command, and then calculate the false positive and true positive rates from the table. The routine I wrote requires that the table be square. There should be an easy way to do it.
My setup:
cm <- table(classifiers$teacher[which(classifiers$problem == 'problem27')],
classifiers$srAve[which(classifiers$problem == 'problem27')]) cm
1 2 3
0 23 0 0
1 2 4 0
2 2 10 0
3 0 1 0
4 0 0 1
> missingNames <- as.numeric( rownames(cm)[ !(rownames(cm) %in% as.numeric(colnames(cm))) ] )
> missingNames
[1] 0 4
And then the C like function I wrote to fix it:
padTable <- function( missingNames, cm ) {
rowLength <- dim(cm)[1]
for (i in missingNames) {
zeroes <- rep(0,rowLength)
cNames <- colnames(cm)
after <- which ( (i < as.numeric(cNames)) )[1]
before <- which ( (i > as.numeric(cNames)) )[1]
if ( is.na(before) ) { #The very begining
cm <- cbind(zeroes,cm)
colnames(cm) <- c(i,cNames)
} else {
if (is.na(after)) { #The very end
cm <- cbind(cm,zeroes)
colnames(cm) <- c(cNames,i)
} else { #somewhere in the middle
print('ERROR CANNOT INSERT INTO MIDDLE YET.')
cm = NULL
}
}
}
return(cm)
}
So, there has to be some dreadfully simple way to make this work. Anytime I find myself writing C code in R, I know that I'm doing it wrong.
Thanks for any help.
EDIT: Sample data as requested:
> classifiers$teacher[which(classifiers$problem == 'problem27')]
[1] 0 0 1 2 2 2 0 0 0 0 0 2 0 0 2 0 4 3 0 0 2 2 0 2 0 0 2 2 1 0 0 2 1 0 1 2 0 0
[39] 0 1 0 0 1
> classifiers$srAve[which(classifiers$problem == 'problem27')]
[1] 1 1 2 2 2 2 1 1 1 1 1 1 1 1 1 1 3 2 1 1 2 2 1 2 1 1 2 2 1 1 1 2 2 1 2 2 1 1
[39] 1 2 1 1 1