In [40]:
%matplotlib inline
import matplotlib.pyplot as plt
plt.rcParams['lines.linewidth'] = 2
plt.rcParams['lines.color'] = 'r'
plt.rcParams['font.size'] = 15
plt.rcParams['axes.labelsize'] = 20
plt.rcParams['axes.titlesize'] = 10
plt.rcParams['xtick.labelsize'] = 12
plt.rcParams['ytick.labelsize'] = 12
plt.rcParams['legend.fontsize'] = 10
plt.rcParams["axes.edgecolor"] = "black"
plt.rcParams['axes.titlesize'] = 24
plt.rcParams['figure.figsize'] = 13, 13
from IPython.display import Image


# Chapter 5: Bayes' Rule¶

Bayes' theorem describes the probability of an event based on the prior knowledge of another event that is related to the original event.

$$p(A|B) = \frac{p(B|A)p(A)}{p(B)}$$

## Example¶

In a population, the probability of getting a disease is 0.001. We designate a parameter $\theta$ as either D (diseased) or H (healthy). Thus, $p(\theta = D) = 0.001$. There is a test, T, such that if an individual has the disease, the test result is positive (+) 99% of the time. The test also has a false positive rate of 5%, meaning that the test indicates positive when the sample is healthy 5% of the times. Thus, $p(T = + | \theta = D) = 0.99$ and $p(T = + | \theta = H) = 0.05$.

We would like to figure out the probability of having the disease given a positive test result or the posterior probability of having the disease.

$$p(\theta = D | T = + ) = \frac{p( T = + |\theta = D)p(\theta = D)}{p(T = +)} = \frac{0.99 * 0.001}{0.99 * 0.001 + 0.05 * (1 - 0.001)} = 0.019$$

Given a positive test result, the probability of having the disease is 1.9%.

## Parameters and Data¶

Given a parameter $\theta$ and data D, $p(\theta|D) = \frac{p(D|\theta)p(\theta)}{p(D)}$

1. $p(\theta)$ = prior - creditibility of the parameter without the data
2. $p(\theta|D)$ = posterior - creditbility of the parameter with the data
3. $p(D|\theta)$ = likelihood - probability of generating the data D given the parameter
4. $p(D)$ = data or marginal likelihood - probability of the data without the parameter

All data is equally representative during the underlying process. The order of the data does not affect the final outcome.

## Estimating the Bias in a Coin¶

A fair coin is one in which the probability of heads is 50%. How do you find the probability of heads in a biased coin? Our sample space is $\Omega =$ [1,0]. y = 1 represents a flip of heads and y = 0 is tails.

### 1. Build a model¶

The first step is to build a model, or likelihood function, with respect to some parameter, $\theta$, which represents the probability of flipping a heads. The model must allow for$p(y=1\vert \theta) = \theta$ and $p(y=0\vert \theta) = 1 - \theta$ to be true. The bernoulli distribution is one such model:

$$p(y \vert \theta) = \theta^y(1-\theta)^{1-y}$$

Notice that if y = 0, $p(y = 0 \vert \theta) = 1 - \theta$ and if y = 1, $p(y = 1 \vert \theta) = \theta$.

### 2. Prior Probability¶

The next step is to define a function for the prior probability. Let's assume that the bias of the coin can only take the values $\theta = 0$, $\theta = 0.1$, $\theta = 0.2$ and so on. We most believe that the coin is fair and thus give it the largest prior probability. As the bias increases, we decrease the probability. The prior probability is shown below.

In [41]:
import numpy as np
from scipy.stats import triang

# possible values of theta - only 11 possible values
thetas = np.linspace(0,1,11)

# prior probability is a triangle distribution
prior = triang.pdf(thetas,0.5)

fig, ax = plt.subplots()

ax.bar(thetas,prior, width = 0.01, color = 'lightsteelblue')
ax.set_xlabel(r'$\theta$')
ax.set_ylabel(r'$p(\theta)$')
ax.set_title('Prior Probability')

fig.tight_layout()


### 3. Data Collection and Bayes' Rule¶

The next step is to create data and then to re-allocate credibility among the parameter values based on the data. Assume we flip the coin once and y = 1, notated as z = 1 and N = 1. Using the likelihood function above, we plot the reallocations below

In [42]:
from scipy.stats import binom

# likelihood function is a binomial distribution
likelihood = binom.pmf(k = 1, n = 1, p = thetas)

fig, ax = plt.subplots()

ax.bar(thetas,likelihood, width = 0.01, color = 'lightsteelblue')

annot = 'Data: z = 1, N = 1'
ax.annotate(annot, xy=(0.05,0.85), xycoords="axes fraction")

ax.set_xlabel(r'$\theta$')
ax.set_ylabel(r'$p(D \vert \theta)$')
ax.set_title('Likelihood')

fig.tight_layout()


### 4. Posterior Probability¶

The next step is to calculate the posterior probability using Bayes Rule.

In [43]:
# Calculate posterior probability using Bayes' rule
prob_data = sum(np.multiply(prior,likelihood))
posterior = np.multiply(prior,likelihood) / prob_data

fig, ax = plt.subplots()

ax.bar(thetas,posterior, width = 0.01, color = 'lightsteelblue')

ax.set_xlabel(r'$\theta$')
ax.set_ylabel(r'$p(\theta \vert D)$')
ax.set_title('Posterior Probability')

fig.tight_layout()


### Influence of Sample Size¶

As expected, larger sample sizes yield a greater certainty in the estimation. This is true even if the ratio between coin flips is the same. An example is shown below with 4 flips, 40 flips, and 400 flips. We also test 1000 values of $\theta$ and use a beta distribution as our prior.

In [44]:
import pymc3 as pm
# possible values of theta - 1001 possible values
# shape parameters for beta distribution
a = 2
b = 2

# 3 experiments of different sample size
Ns = [4,40,400]
traces = []
# get samples with beta prior and a binomial likelihood
for x in range(3):
with pm.Model() as model: # context management
# number of samples
samples = 1000
# define priors
p = pm.Beta('p', a, b)

# define likelihood
y = pm.Binomial('y', n=Ns[x], p=p, observed=0.25 * Ns[x])

traces.append(pm.sample(samples, progressbar=False));

Auto-assigning NUTS sampler...
Multiprocess sampling (2 chains in 2 jobs)
NUTS: [p_logodds__]
Auto-assigning NUTS sampler...
Multiprocess sampling (2 chains in 2 jobs)
NUTS: [p_logodds__]
Auto-assigning NUTS sampler...
Multiprocess sampling (2 chains in 2 jobs)
NUTS: [p_logodds__]

In [45]:
from scipy.stats import beta
thetas = np.linspace(0,1,1001)

fig, axs = plt.subplots(3,3)
annot = 'mode = 0.5'
axs[0,1].annotate(annot, xy=(0.63,0.9), xycoords="axes fraction")

for grid in range(3):
axs[0,grid].bar(thetas, beta.pdf(thetas, a = a,b =  b), label='prior', width = 0.01, color = 'lightsteelblue');
axs[1,grid].bar(thetas,binom.pmf(k = 0.25 * Ns[grid],n = Ns[grid], p = thetas), width = 0.01, color = 'lightsteelblue')
annot = 'Data: z = ' + str(int(Ns[grid]*0.25)) + '\nN = ' + str(Ns[grid])
axs[1,grid].annotate(annot, xy=(0.5,0.8), xycoords="axes fraction")
pm.plots.plot_posterior(traces[grid], label='post', color = 'lightsteelblue', ax = axs[2,grid])

axs[0,1].set_title('Prior')
axs[1,1].set_title('Likelihood')
axs[2,1].set_title('Posterior')
fig.tight_layout()


Notice how the highest posterior density shrinks as more data is included in the model. The estimated mean also moves closer to the actual value of 0.25.

### Influence of Prior¶

One of the biggest factors in finding a credible parameter value is the prior information about the parameter. Below is an example of two posterior probabilities with a uniform and beta prior.

In [46]:
# number of trials in experiment
N = 4
z = 1
# shape parameters for beta distribution
a = 0.5
b = 2

# get samples of posterior from beta prior
with pm.Model() as model: # context management
# number of samples
samples = 1000
# define priors
p = pm.Beta('p', a,b)

# define likelihood
y = pm.Binomial('y', n=N, p=p, observed=z)

trace_beta = pm.sample(samples, progressbar=False);

# get samples of posterior from uniform prior
with pm.Model() as model: # context management
# number of samples
samples = 1000
# define priors
p = pm.Uniform('p', 0, 1)

# define likelihood
y = pm.Binomial('y', n=N, p=p, observed=z)

trace_uniform = pm.sample(samples, progressbar=False);

Auto-assigning NUTS sampler...
Multiprocess sampling (2 chains in 2 jobs)
NUTS: [p_logodds__]
The acceptance probability does not match the target. It is 0.896323658389, but should be close to 0.8. Try to increase the number of tuning steps.
Auto-assigning NUTS sampler...
Multiprocess sampling (2 chains in 2 jobs)
NUTS: [p_interval__]

In [47]:
from scipy.stats import uniform

fig, axs = plt.subplots(3,2)

axs[0,0].bar(thetas, beta.pdf(thetas,a,b), label='prior', width = 0.01, color = 'lightsteelblue');
axs[1,0].bar(thetas,binom.pmf(k = z,n = N, p = thetas), width = 0.01, color = 'lightsteelblue')
axs[0,0].annotate('Beta Distribution', xy=(0.35,0.5), xycoords="axes fraction")
axs[1,0].annotate('Data: z = ' + str(z) + '\nN = ' + str(N), xy=(0.5,0.8), xycoords="axes fraction")
pm.plots.plot_posterior(trace_beta, label='post', color = 'lightsteelblue', ax = axs[2,0])

axs[0,1].bar(thetas, uniform.pdf(thetas,0,1), label='prior', width = 0.01, color = 'lightsteelblue');
axs[0,1].annotate('Uniform Distribution', xy=(0.35,0.5), xycoords="axes fraction")
axs[1,1].bar(thetas,binom.pmf(k = z,n = N, p = thetas), width = 0.01, color = 'lightsteelblue')
pm.plots.plot_posterior(trace_uniform, label='post', color = 'lightsteelblue', ax = axs[2,1])

axs[0,0].set_title('Prior')
axs[1,0].set_title('Likelihood')
axs[2,0].set_title('Posterior')
fig.tight_layout()


# Chapter 5: Exercises¶

## Excercise 5.1¶

[Purpose: Iterative application of Bayes’ rule, and seeing how posterior probabilities change with inclusion of more data.] This exercise extends the ideas of Table 5.4, so at this time, please review Table 5.4 and its discussion in the text. Suppose that the same randomly selected person as in Table 5.4 gets re-tested after the first test result was positive, and on the re-test, the result is negative. When taking into account the results of both tests, what is the probability that the person has the disease? Hint: For the prior probability of the re-test, use the posterior computed from the Table 5.4. Retain as many decimal places as possible, as rounding can have a surprisingly big effect on the results. One way to avoid unnecessary rounding is to do the calculations in R.

In [48]:
p_disease = 0.001 # P(D)
p_clean = (1 - p_disease) # P(H)
p_true_pos = 0.99 # P(+|D)
p_false_neg = (1 - p_true_pos) # P(-|D)
p_false_pos = 0.05 # P(+|H)
p_true_neg = (1 - p_false_pos) # P(-|H)

In [49]:
# P(D|+) = P(+|D)*P(D) / P(+)

liklihood = p_true_pos
prior = p_disease
marginal_likelihood = (p_true_pos * p_disease) + (p_false_pos * p_clean)
p_disease_when_pos = liklihood * prior / marginal_likelihood
print('P(D|+):', p_disease_when_pos)

P(D|+): 0.019434628975265017

In [50]:
# P(D|+, -)

new_liklihood = p_false_neg
new_prior = p_disease_when_pos
new_marginal_likelihood = (p_true_neg * p_clean) + (p_false_neg * p_disease)
p_disease_when_pos_then_neg = new_liklihood * new_prior / new_marginal_likelihood
print('P(D|+, -):', p_disease_when_pos_then_neg)

P(D|+, -): 0.0002047776639544922


## Exercies 5.2¶

[Purpose: Getting an intuition for the previous results by using “natural frequency” and “Markov” representations]

(A) Suppose that the population consists of 100,000 people. Compute how many people would be expected to fall into each cell of Table 5.4. To compute the expected frequency of people in a cell, just multiply the cell probability by the size of the population. To get you started, a few of the cells of the frequency table are filled in here:

In [51]:
import pandas as pd

empty_array = [[0, 0], [0, 0]]
df = pd.DataFrame(empty_array)
df.columns = ['Disease', 'Healthy']
df.index = ['T = +', 'T = -']

N = 100000
df.loc['T = +','Disease'] = int(p_true_pos * p_disease * N)
df.loc['T = +','Healthy'] = int(p_false_pos * p_clean * N)
df.loc['T = -','Disease'] = int(p_false_neg * p_disease * N)
df.loc['T = -','Healthy'] = int(p_true_neg * p_clean * N)
df

Out[51]:
Disease Healthy
T = + 99 4995
T = - 1 94905

In [52]:
# P(D|+)
p_disease_when_pos = df.loc['T = +', 'Disease']/df.loc['T = +'].sum()
print('P(D|+):',p_disease_when_pos)

P(D|+): 0.0194346289753


(C) Now we’ll consider a related representation of the probabilities in terms of natural frequencies, which is especially useful when we accumulate more data. This type of representation is called a “Markov” representation by Krauss, Martignon, and Hoffrage (1999). Suppose now we start with a population of N = 10, 000, 000 people. We expect 99.9% of them (i.e., 9,990,000) not to have the disease, and just 0.1% (i.e., 10,000) to have the disease. Now consider how many people we expect to test positive. Of the 10,000 people who have the disease, 99%, (i.e., 9,900) will be expected to test positive. Of the 9,990,000 people who do not have the disease, 5% (i.e., 499,500) will be expected to test positive. Now consider re-testing everyone who has tested positive on the first test. How many of them are expected to show a negative result on the retest? Use this diagram to compute your answer:

In [53]:
N = 10000000
p_disease = 0.001
p_clean = (1 - p_disease)
n_disease = N * p_disease
n_clean = N * p_clean
p_true_pos = 0.99
p_false_neg = (1 - p_true_pos)
p_clean = (1 - p_disease)
p_false_pos = 0.05
p_true_neg = (1 - p_false_pos)

In [54]:
print('Number of people who have the disease and a postive and a negative test results in two tests.')
print('E(+ | D) is', int(n_disease * p_true_pos))
print('E(- | D) is', int(n_disease * p_false_neg))

n_diff_results_when_disease = n_disease * p_true_pos * p_false_neg
print('E(- | +, D) is', int(n_diff_results_when_disease))
print()
print('Number of people who do not have the disease and a postive and a negative test results in two tests.')
print('E(+ | H) is', int(n_clean * p_false_pos))
print('E(- | H) is', int(n_clean * p_true_neg))

n_diff_results_when_clean = n_clean * p_false_pos * p_true_neg
print('E(- | +, H) is', int(n_diff_results_when_clean))

Number of people who have the disease and a postive and a negative test results in two tests.
E(+ | D) is 9900
E(- | D) is 100
E(- | +, D) is 99

Number of people who do not have the disease and a postive and a negative test results in two tests.
E(+ | H) is 499500
E(- | H) is 9490500
E(- | +, H) is 474525


(D) Use the diagram in the previous part to answer this: What proportion of people, who test positive at first and then negative on retest, actually have the disease? In other words, of the total number of people at the bottom of the diagram in the previous part (those are the people who tested positive then negative), what proportion of them are in the left branch of the tree? How does the result compare with your answer to Exercise 5.1?

In [55]:
n_disease_when_diff_results = n_diff_results_when_disease / \
(n_diff_results_when_clean + n_diff_results_when_disease)
print(n_disease_when_diff_results)

0.00020858616504854387


## Exercise 5.3¶

[Purpose: To see a hands-on example of data-order invariance.]

Consider again the disease and diagnostic test of the previous two exercises.

(A) Suppose that a person selected at random from the population gets the test and it comes back negative. Compute the probability that the person has the disease.

In [56]:
# P(D|-)
likelihood = p_false_neg
prior = p_disease
marginal_likelihood = (p_false_neg * p_disease) + (p_true_neg * p_clean)
p_disease_when_neg = likelihood * prior / marginal_likelihood
print("P(D|-):",p_disease_when_neg)

P(D|-): 1.0536741618022054e-05


(B) The person then gets re-tested, and on the second test the result is positive. Compute the probability that the person has the disease. How does the result compare with your answer to Exercise 5.1?

In [57]:
new_likelihood = p_true_pos
new_prior = p_disease_when_neg
new_marginal_likelihood = (p_true_pos * p_disease) + (p_false_pos * p_clean)
p_disease_when_neg_and_pos = new_likelihood * new_prior / new_marginal_likelihood
print("P(D|-,+):", p_disease_when_neg_and_pos)
print("Result is same as Exercies 5.1")

P(D|-,+): 0.00020477766395449222
Result is same as Exercies 5.1