02 - The Garden of Forking Data
Homework - solutions
Book question 2H1
Suppose there are two species of panda bear. Both are equally common in the wild and live in the same places. They look exactly alike and eat the same food, and there is yet no genetic assay capable of telling them apart. They differ however in their family sizes. Species A gives birth to twins 10% of the time, otherwise birthing a single infant. Species B births twins 20% of the time, otherwise birthing singleton infants. Assume these numbers are known with certainty, from many years of field research.
Now suppose you are managing a captive panda breeding program. You have a new female panda of unknown species, and she has just given birth to twins. What is the probability that her next birth will also be twins?
Solution
# both are equally common...
P(A) = 0.50
P(B) = 0.50
P(twin|A) = 0.10
P(twin|B) = 0.20
P(twin1) = P(twin|A) * P(A) + P(twin|B) * P(B)
P(twin1) = 0.10 * 0.50 + 0.20 * 0.50 = 0.15
P(twin2 & twin1) = P(twin|A) * P(twin|A) * P(A) + P(twin|B) * P(twin|B) * P(B)
P(twin2 & twin1) = 0.01 * 0.50 + 0.04 * 0.50 = 0.025
# definition of conditional
# p(A and B) = p(A | B) * p(B)
P(twin2|twin1) = P(twin2 & twin1) / P(twin1)
P(twin2|twin1) = 0.025 / 0.15 = 0.167
The probability of having twins again is 16.7%.
Book question 2H2
Recall all the facts from the problem above. Now compute the probability that the panda we have is from species A, assuming we have observed only the first birth and that it was twins.
Solution
P(A) = 0.50
P(B) = 0.50
P(twin|A) = 0.10
P(twin|B) = 0.20
P(twin1) = P(twin|A) * P(A) + P(twin|B) * P(B)
P(twin1) = 0.10 * 0.50 + 0.20 * 0.50 = 0.15
# Bayes theorem
# p(A | B) = p(B | A) * p(A) / p(B)
P(A|twin1) = P(twin|A) * P(A) / P(twin1)
P(A|twin1) = 0.10 * 0.50 / 0.15 = 0.333
The probability of the panda being species A is 33.3%
Book question 2H3
Continuing on from the previous problem, suppose the same panda mother has a second birth and that it is not twins, but a singleton infant. Compute the posterior probability that this panda is species A.
Solution
# Originally
P(A) = 0.50
P(B) = 0.50
# But after twin1 we believe (updating)
P(A) = 1/3
P(B) = 2/3
P(twin|A) = 0.10
P(twin|B) = 0.20
P(twin1) = P(twin|A) * P(A) + P(twin|B) * P(B)
P(twin1) = 0.10 * 0.50 + 0.20 * 0.50 = 0.15
P(A|singleton) = P(singleton|A) * P(A) / P(singleton)
P(singleton|A) = 1 - P(twin|A) = 0.90
P(singleton) = P(singleton|A) * P(A) + P(singleton|B) * P(B)
P(singleton) = 0.90 * 1/3 + 0.80 * 2/3 = 5/6
P(A|singleton) = 0.90 * (1/3) / (5/6) = 0.36
The probability of the panda being species A is 36% (rising from 36%).