ansaurus

Question

How to write a confusion matrix in Python?

Answer 1

A:

You should map from classes to a row in your confusion matrix.

Here the mapping is trivial:

def row_of_class(classe):
    return {1: 0, 2: 1}[classe]

In your loop, compute expected_row, correct_row, and increment conf_arr[expected_row][correct_row]. You'll even have less code than what you started with.

Tobu 2010-01-27 16:54:25

Answer 2

+1 A:

This function creates confusion matrices for any number of classes.

def create_conf_matrix(expected, predicted, n_classes):
    m = [[0] * n_classes for i in range(n_classes)]
    for pred, exp in zip(predicted, expected):
        m[pred][exp] += 1
    return m

def calc_accuracy(conf_matrix):
    t = sum(sum(l) for l in conf_matrix)
    return sum(conf_matrix[i][i] for i in range(len(conf_matrix))) / t

In contrast to your function above, you have to extract the predicted classes before calling the function, based on your classification results, i.e. sth. like

[1 if p < .5 else 2 for p in classifications]

Torsten Marek 2010-01-27 16:55:38

This like gives a syntax error, I am not good enough in Python to fix it though :) m = [[0] * n_classes] for i in range(n_classes)] ^SyntaxError: invalid syntax

Stephen T. 2010-01-27 17:34:15

I think you need one more `[`: `m = [[[0] * ...`

Tim Pietzcker 2010-01-27 17:54:47

Actually, it's one less:)---fixed.

Torsten Marek 2010-01-27 17:56:12

`s/observed/predicted/`

J.F. Sebastian 2010-01-27 18:22:08

You might have created *transposed* confusion matrix.

J.F. Sebastian 2010-01-27 18:27:09

Yeah well, I should've run the code... ;)

Torsten Marek 2010-01-27 18:45:58

Answer 3

A:

In a general sense, you're going to need to change your probability array. Instead of having one number for each instance and classifying based on whether or not it is greater than 0.5, you're going to need a list of scores (one for each class), then take the largest of the scores as the class that was chosen (a.k.a. argmax).

You could use a dictionary to hold the probabilities for each classification:

prob_arr = [{classification_id: probability}, ...]

Choosing a classification would be something like:

for instance_scores in prob_arr :
    predicted_classes = [cls for (cls, score) in instance_scores.iteritems() if score = max(instance_scores.values())]

This handles the case where two classes have the same scores. You can get one score, by choosing the first one in that list, but how you handle that depends on what you're classifying.

Once you have your list of predicted classes and a list of expected classes you can use code like Torsten Marek's to create the confusion array and calculate the accuracy.

tgray 2010-01-27 17:08:26

Answer 4

A:

You can make your code more concise and (sometimes) to run faster using numpy. For example, in two-classes case your function can be rewritten as (see mply.acc()):

def accuracy(actual, predicted):
    """accuracy = (tp + tn) / ts

    , where:    

        ts - Total Samples
        tp - True Positives
        tn - True Negatives
    """
    return (actual == predicted).sum() / float(len(actual))

, where:

actual    = (numpy.array(input_arr) == 2)
predicted = (numpy.array(prob_arr) < 0.5)

J.F. Sebastian 2010-01-27 18:18:04

Answer 5

A:

Here's a confusion matrix class that supports pretty-printing, etc:

http://nltk.googlecode.com/svn/trunk/doc/api/nltk.metrics.confusionmatrix-pysrc.html

Edward Loper 2010-01-27 21:10:37

ansaurus

tags:

views:

answers:

How to write a confusion matrix in Python?

related questions