上QQ阅读APP看书，第一时间看更新

Precision, recall, and F1-score

To assess the quality of the algorithm considering the two types of error, accuracy metric is useless. That's why different metrics were proposed.

Precision and recall are metrics used to evaluate a prediction's quality in information retrieval and binary classification. Precision is a proportion of true positives among all predicted positives. It shows how relevant results are. Recall, also known as sensitivity, is a proportion of true positives among all truly positive samples. For example, if the task is to distinguish cat photos from non-cat photos, precision is a fraction of correctly predicted cats to all predicted cats. Recall is a fraction of predicted cats to the total number of true cats.

If we denote the number of true positive cases as T_p, and number of false positive cases as F_p, then precision P is calculated as:

Recall R is calculated as:

Where F_n is a number of false negative cases.

F1 measure is calculated as:

Now the same in Python:

In []: 
import numpy as np 
predictions = tree_model.predict(X_test) 
predictions = np.array(map(lambda x: x == 'rabbosaurus', predictions), dtype='int') 
true_labels = np.array(map(lambda x: x == 'rabbosaurus', y_test), dtype='int') 
from sklearn.metrics import precision_score, recall_score, f1_score 
precision_score(true_labels, predictions) 
Out[]: 
0.87096774193548387 
In []: 
recall_score(true_labels, predictions) 
Out[]: 
0.88815789473684215 
In []: 
f1_score(true_labels, predictions) 
Out[]: 
0.87947882736156346