Precision, recall, and F1-score
To assess the quality of the algorithm considering the two types of error, accuracy metric is useless. That's why different metrics were proposed.
Precision and recall are metrics used to evaluate a prediction's quality in information retrieval and binary classification. Precision is a proportion of true positives among all predicted positives. It shows how relevant results are. Recall, also known as sensitivity, is a proportion of true positives among all truly positive samples. For example, if the task is to distinguish cat photos from non-cat photos, precision is a fraction of correctly predicted cats to all predicted cats. Recall is a fraction of predicted cats to the total number of true cats.
If we denote the number of true positive cases as Tp, and number of false positive cases as Fp, then precision P is calculated as:
Recall R is calculated as:
Where Fn is a number of false negative cases.
F1 measure is calculated as:
Now the same in Python:
In []: import numpy as np predictions = tree_model.predict(X_test) predictions = np.array(map(lambda x: x == 'rabbosaurus', predictions), dtype='int') true_labels = np.array(map(lambda x: x == 'rabbosaurus', y_test), dtype='int') from sklearn.metrics import precision_score, recall_score, f1_score precision_score(true_labels, predictions) Out[]: 0.87096774193548387 In []: recall_score(true_labels, predictions) Out[]: 0.88815789473684215 In []: f1_score(true_labels, predictions) Out[]: 0.87947882736156346