06 - Good & Bad Controls

Homework about foxes

All three problems below are based on the same data. The data in data(foxes) are 116 foxes from 30 different urban groups in England. These foxes are like street gangs. Group size varies from 2 to 8 individuals. Each group maintains its own (almost exclusive) urban territory. Some territories are larger than others. The area variable encodes this information. Some territories also have more avgfood than others. We want to model the weight of each fox. For the problems below, assume this DAG:

data(foxes, package = "rethinking") # foxes dataset
foxes <- as_tibble(foxes)
?foxes

No documentation for 'foxes' in specified packages and libraries:
you could try '??foxes'

fox_dag <- dagitty(
  "dag {
    area -> avgfood 
    avgfood -> groupsize 
    avgfood -> weight 
    groupsize -> weight
}")
coordinates(fox_dag) <- 
  list(x=c(area=2, avgfood=0, groupsize=4, weight=2),
       y=c(area=0, avgfood=1, groupsize=1, weight=2))
tidy_dagitty(fox_dag) %>% ggdag()

impliedConditionalIndependencies(fox_dag)

area _||_ grps | avgf
area _||_ wght | avgf

# stardardize
foxes <- foxes %>% 
  mutate(avgfood_s = as.numeric(scale(avgfood)),
         groupsize_s = as.numeric(scale(groupsize)),
         area_s = as.numeric(scale(area)),
         weight_s = as.numeric(scale(weight)))

1

Use a model to infer the total causal influence of area on weight. Would increasing the area available to each fox make it heavier (healthier)? You might want to standardize the variables. Regardless, use prior predictive simulation to show that your model’s prior predictions stay within the possible outcome range.

# area vs weight
bfox1 <- 
  brm(data = foxes, family = gaussian, 
      weight_s ~ area_s, 
      prior = c(prior(normal(0, 0.2), class = Intercept),
                prior(normal(0, 0.5), class = b),
                prior(exponential(1), class = sigma)),
      iter = 2000, warmup = 1000, chains = 4, cores = 4,
      file = here::here(models_folder, "model_b6f1"),
      save_pars = save_pars(all = T), sample_prior = T,       
      seed = 6)

print(bfox1)

 Family: gaussian 
  Links: mu = identity; sigma = identity 
Formula: weight_s ~ area_s 
   Data: foxes (Number of observations: 116) 
  Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
         total post-warmup draws = 4000

Population-Level Effects: 
          Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept     0.00      0.09    -0.17     0.17 1.00     3762     2904
area_s        0.02      0.09    -0.17     0.19 1.00     3291     2802

Family Specific Parameters: 
      Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sigma     1.01      0.07     0.88     1.15 1.00     3540     2458

Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).

# Territory size seems to have no total causal influence on weight

adjustmentSets(fox_dag, exposure = "area", outcome = "weight")

{}

2

Now infer the causal impact of adding food to a territory. Would this make foxes heavier? Which covariates do you need to adjust for to estimate the total causal influence of food?

# avgfood vs weight
bfox2 <- 
  brm(data = foxes, family = gaussian, 
      weight_s ~ avgfood_s, 
      prior = c(prior(normal(0, 0.2), class = Intercept),
                prior(normal(0, 0.5), class = b),
                prior(exponential(1), class = sigma)),
      iter = 2000, warmup = 1000, chains = 4, cores = 4,
      file = here::here(models_folder, "model_b6f2"),
      save_pars = save_pars(all = T), sample_prior = T,       
      seed = 6)
print(bfox2)

 Family: gaussian 
  Links: mu = identity; sigma = identity 
Formula: weight_s ~ avgfood_s 
   Data: foxes (Number of observations: 116) 
  Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
         total post-warmup draws = 4000

Population-Level Effects: 
          Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept     0.00      0.08    -0.16     0.17 1.00     3888     2750
avgfood_s    -0.02      0.09    -0.21     0.16 1.00     3781     2790

Family Specific Parameters: 
      Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sigma     1.01      0.07     0.89     1.15 1.00     4068     3162

Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).

# If you include groupsize, to block the 1 indirect path, then you won’t get the total causal influence of food. You’ll just get the direct influence. But I asked for the effect of adding food, and that would mean through all forward paths.

adjustmentSets(fox_dag, exposure = "avgfood", outcome = "weight")

{}

3

Now infer the causal impact of group size. Which covariates do you need to adjust for? Looking at the posterior distribution of the resulting model, what do you think explains these data? That is, can you explain the estimates for all three problems? How do they go together?

# groupsize vs weight
bfox3 <- 
  brm(data = foxes, family = gaussian, 
      weight_s ~ groupsize_s + avgfood_s, 
      prior = c(prior(normal(0, 0.2), class = Intercept),
                prior(normal(0, 0.5), class = b),
                prior(exponential(1), class = sigma)),
      iter = 2000, warmup = 1000, chains = 4, cores = 4,
      file = here::here(models_folder, "model_b6f3"),
      save_pars = save_pars(all = T), sample_prior = T,       
      seed = 6)
print(bfox3)

 Family: gaussian 
  Links: mu = identity; sigma = identity 
Formula: weight_s ~ groupsize_s + avgfood_s 
   Data: foxes (Number of observations: 116) 
  Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
         total post-warmup draws = 4000

Population-Level Effects: 
            Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept       0.00      0.08    -0.16     0.17 1.00     3067     2254
groupsize_s    -0.57      0.18    -0.92    -0.22 1.00     2099     2189
avgfood_s       0.47      0.18     0.12     0.83 1.00     2083     2149

Family Specific Parameters: 
      Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sigma     0.96      0.06     0.84     1.10 1.00     2729     2566

Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).

# But the causal explanation here is that more foxes move into a territory until the food available to each is no better than the food in a neighboring territory. Every territory ends up equally good/bad on average. This is known in behavioral ecology as an *ideal free distribution*

adjustmentSets(fox_dag, exposure = "groupsize", outcome = "weight")

{ avgfood }