tags:

views:

50

answers:

1

Hi,

I'm comparing some classifiers. My procedure is to compute the confusion matrix with the table command, and then calculate the false positive and true positive rates from the table. The routine I wrote requires that the table be square. There should be an easy way to do it.

My setup:

cm <- table(classifiers$teacher[which(classifiers$problem == 'problem27')],
classifiers$srAve[which(classifiers$problem == 'problem27')]) cm

     1  2  3
  0 23  0  0
  1  2  4  0
  2  2 10  0
  3  0  1  0
  4  0  0  1
> missingNames <- as.numeric( rownames(cm)[ !(rownames(cm) %in% as.numeric(colnames(cm)))  ] )
> missingNames
[1] 0 4

And then the C like function I wrote to fix it:

padTable <- function( missingNames, cm ) {
    rowLength <- dim(cm)[1]
    for (i in missingNames) {
       zeroes <- rep(0,rowLength)
       cNames <- colnames(cm)
       after <- which ( (i < as.numeric(cNames)) )[1]
       before <- which ( (i > as.numeric(cNames)) )[1]
       if ( is.na(before) ) { #The very begining
          cm <- cbind(zeroes,cm)
          colnames(cm) <- c(i,cNames)
       } else {
          if (is.na(after)) { #The very end
              cm <- cbind(cm,zeroes)
              colnames(cm) <- c(cNames,i)
          } else { #somewhere in the middle
               print('ERROR CANNOT INSERT INTO MIDDLE YET.')
               cm = NULL
          }
       }
    } 
    return(cm)
}

So, there has to be some dreadfully simple way to make this work. Anytime I find myself writing C code in R, I know that I'm doing it wrong.

Thanks for any help.

EDIT: Sample data as requested:

> classifiers$teacher[which(classifiers$problem == 'problem27')]
[1] 0 0 1 2 2 2 0 0 0 0 0 2 0 0 2 0 4 3 0 0 2 2 0 2 0 0 2 2 1 0 0 2 1 0 1 2 0 0
[39] 0 1 0 0 1
> classifiers$srAve[which(classifiers$problem == 'problem27')]
[1] 1 1 2 2 2 2 1 1 1 1 1 1 1 1 1 1 3 2 1 1 2 2 1 2 1 1 2 2 1 1 1 2 2 1 2 2 1 1
[39] 1 2 1 1 1
+3  A: 

You should simply be able to convert classifiers$teacher and classifiers$srAve to factors but I'm just guessing, since I don't know what your data are like.

> x <- factor(sample(0:4,20,TRUE))
> y <- factor(sample(1:3,20,TRUE),levels=levels(x))
> z <- data.frame(x,y)
> table(z)
   y
x   0 1 2 3 4
  0 0 1 2 0 0
  1 0 1 0 1 0
  2 0 2 2 1 0
  3 0 3 3 2 0
  4 0 0 2 0 0
> z$y <- as.character(y)
> table(z)
   y
x   1 2 3
  0 1 2 0
  1 1 0 1
  2 2 2 1
  3 3 3 2
  4 0 2 0
Joshua Ulrich
Not knowing which levels are in `x` and which are in `y`, you could do `levels(x) = sort(union(levels(x), levels(y)))` and `levels(y) = levels(x)`
Greg
That makes sense. I had tried to just convert srAve and teacher to factors, but I didn't add the "extra" levels to srAve. Thanks everyone!
Nathan VanHoudnos