Hands-On Machine Learning with Microsoft Excel 2019
上QQ阅读APP看书,第一时间看更新

Entropy of the target variable

The definition of entropy when looking at a single attribute is as follows:

Here, c is the total number of possible values of the feature f, pi is the probability of each value, and log2(pi) is the base two logarithm of the same probability. The calculation details are as follows:

  1. We need to count the number of Yes and No decisions in the dataset. In our simple example, they can be counted by hand, but if the dataset is larger, we can use Excel functions:

COUNTIF(F2:F15;"Yes") and COUNTIF(F2:F15;"No")

We then get the calculation that Yes = 9 and No = 5.

  1. When applying the entropy formula to the target variable, we get the following:

Here, the probabilities are calculated as the number of Yes (9) or No (5) over the total number (14).

This calculation can also be easily performed in the Excel sheet using I3/(I3+J3)*LOG(I3/(I3+J3);2)-J3/(I3+J3)*LOG(J3/(I3+J3);2) with I3=9 and J3=5.