Training Systems Using Python Statistical Modeling

上QQ阅读APP看书，第一时间看更新

Computing confidence intervals for proportions

The sample proportion is computed by counting the number of successes and dividing this by the total sample size. This can be better explained using the following formula:

Here, N is the sample size and M is the number of success variables; this gives you the sample proportion of successes.

Now, we want to be able to make a statement about the population proportion, which is a fixed, yet unknown, quantity. We will construct a confidence interval for this proportion, using the following formula:

Here, z_p is the 100 × pth percentile of the normal distribution.

Now, let's suppose that, on a certain website, out of 1,126 visitors, 310 clicked on a certain ad. Let's construct a confidence interval for the population proportion of visitors who clicked on the ad. This will allow us to predict future clicks. We will use the following steps to do so:

Let's first load the data in the statsmodels package and actually compute the sample proportion, which, in this case, is 310 out of 1,126:

You can see that appropriately 28% of the visitors to the website clicked on the ad on that day.

Our next step is to actually construct a confidence interval using the proportion_confint() function. We assign the number of successes in the count variable, the number of trials in the nobs variable, and the confidence in the alpha variable, as shown in the following code snippet:

As you can see here, with 95% confidence, the proportion is between approximately 25% and 30%.

If we wanted a larger confidence interval, that is, a 99% confidence interval, then we could specify a different alpha, as follows: