07 - Fitting Over & Under

Homework

Revisit the urban fox data, data(foxes) from the last class. Use WAIC or PSIS based model comparison on five different models, each using weight as the outcome, and containing these sets of predictor variables

avgfood + groupsize + area
avgfood + groupsize
groupsize + area
avgfood
area

Can you explain the relative differences in WAIC scores, using the fox DAG from the previous class? Be sure to pay attention to the standard error of the score differences.

data(foxes, package = "rethinking") # foxes dataset
foxes <- as_tibble(foxes)
# stardardize
foxes <- foxes %>% 
  mutate(F = as.numeric(scale(avgfood)),
         G = as.numeric(scale(groupsize)),
         A = as.numeric(scale(area)),
         W = as.numeric(scale(weight)))

mfox1 <- brm(
  data = foxes, family = gaussian,
  W ~ F + G + A, 
  prior = c(
    prior(normal(0, 0.2), class = Intercept),
    prior(normal(0, 0.5), class = b),
    prior(exponential(1), class = sigma)),
  file = here::here(models, "model07_foxhw1"))
mfox2 <- brm(
  data = foxes, family = gaussian,
  W ~ F + G, 
  prior = c(
    prior(normal(0, 0.2), class = Intercept),
    prior(normal(0, 0.5), class = b),
    prior(exponential(1), class = sigma)),
  file = here::here(models, "model07_foxhw2"))
mfox3 <- brm(
  data = foxes, family = gaussian,
  W ~ G + A, 
  prior = c(
    prior(normal(0, 0.2), class = Intercept),
    prior(normal(0, 0.5), class = b),
    prior(exponential(1), class = sigma)),
  file = here::here(models, "model07_foxhw3"))
mfox4 <- brm(
  data = foxes, family = gaussian,
  W ~ F, 
  prior = c(
    prior(normal(0, 0.2), class = Intercept),
    prior(normal(0, 0.5), class = b),
    prior(exponential(1), class = sigma)),
  file = here::here(models, "model07_foxhw4"))
mfox5 <- brm(
  data = foxes, family = gaussian,
  W ~ A, 
  prior = c(
    prior(normal(0, 0.2), class = Intercept),
    prior(normal(0, 0.5), class = b),
    prior(exponential(1), class = sigma)),
  file = here::here(models, "model07_foxhw5"))

mfox1 <- add_criterion(mfox1, c("waic", "loo"))
mfox2 <- add_criterion(mfox2, c("waic", "loo"))
mfox3 <- add_criterion(mfox3, c("waic", "loo"))
mfox4 <- add_criterion(mfox4, c("waic", "loo"))
mfox5 <- add_criterion(mfox5, c("waic", "loo"))

w <- loo_compare(
  mfox1, mfox2, mfox3, mfox4, mfox5, 
  criterion = "waic")
print(w, simplify = F)

      elpd_diff se_diff elpd_waic se_elpd_waic p_waic se_p_waic waic   se_waic
mfox1    0.0       0.0  -161.4       7.8          4.4    0.8     322.8   15.5 
mfox3   -0.5       1.3  -161.9       7.6          3.5    0.6     323.7   15.2 
mfox2   -0.5       1.7  -161.9       7.7          3.5    0.6     323.7   15.4 
mfox4   -5.2       3.4  -166.6       6.7          2.3    0.3     333.3   13.4 
mfox5   -5.3       3.4  -166.7       6.7          2.4    0.4     333.4   13.3

Comment from the textbook

Notice that the top three models are m1, m3, and m2. They have very similar WAIC values. The dif- ferences are small and smaller in all cases than the standard error of the difference. WAIC sees these models are tied. This makes sense, given the DAG, because as long as a model has groupsize in it, we can include either avgfood or area or both and get the same inferences. Another way to think of this is that the influence of good, adjusting for group size, is (according to the DAG) the same as the influence of area, adjusting for group size, because the influence of area is routed entirely through food and group size. There are no backdoor paths.

What about the other two models, m4 and m5? These models are tied with one another, and both omit group size. Again, the influence of area passes entirely through food. So including only food or only area should produce the same inference—the total causal influence of area (or food) is just about zero.

library(dagitty)
library(ggdag)


Attaching package: 'ggdag'

The following object is masked from 'package:stats':

    filter

fox_dag <- dagitty(
  "dag {
    area -> avgfood 
    avgfood -> groupsize 
    avgfood -> weight 
    groupsize -> weight
}")
coordinates(fox_dag) <- 
  list(x=c(area=2, avgfood=0, groupsize=4, weight=2),
       y=c(area=0, avgfood=1, groupsize=1, weight=2))
tidy_dagitty(fox_dag) %>% ggdag()