Evaluating performance of the model with data
The ways to assess the quality of a model's predictions quantitatively are known as metrics. The simplest metric in classification is accuracy, a proportion of correctly classified cases. Accuracy metric can be misleading. Imagine that you have a training set with 1000 samples. 999 of them are of class A, and 1 of class B. Such a kind of dataset is called imbalanced. The baseline (the simplest) solution in this case would be to always predict class A. Accuracy of such a model would then be 0.999, which can be pretty impressive, but only if you don't know about the ratio of classes in the training set. Now imagine that class A corresponds to an outcome of healthy, and class B to cancer, in the medical diagnostic system. It's clear now that 0.999 accuracy is worth nothing, and totally misleading. Another thing to consider is that the cost of different errors can be different. What's worse: to diagnose a healthy person as ill, or an ill person as healthy? This leads to the notion of two types of error (Figure 2.10):
- Type I error, also known as false positive: algorithm predicts cancer, while there is no cancer
- Type II error, also known as, false negative: algorithm predicts no cancer, while there is.