In an earlier blog, I summarized my learning about why and how we are able to identify the average treatment effect from randomized controlled trials (RCT). However, in real-world applicationbs, it is not always feasible to carry out RCT due to various possible reasons (e.g., legal, ethical, etc.), and we often need to rely on observational studies.

In observational studies, the treatment assignment is not purely random, and may depend on covariates \(X\). As a result, the observed covariate distributions of treated and control can be different. We call a third variable that is related to both the treatment and the response as a confounder.

In RCT, the distribution of \(Y|W=0\) is the same as the distribution of \(Y(0)\), but this relationship does not hold for observational studies. In order to conduct causal inference under observational data, one fundamental assumption that we need to make is the

\[ \{ Y_{i}(0), Y_{i}(1) \perp W_{i} | X_{i} \} \]

The implication of this assumption is that we can measure all confounders, which is critical to ensure the identifiability of the ATE (note that the presence of unobserved confounders makes it impossible to separate correlation and causality). Potential solutions for unmeasured confounding include (a variable that is related to the treatment assignment but not the outcome), and which can help us how much unmeasured confounding to flip our conclusions, which are beyond the scope of this blog.

In this blog, I shall focus on two solutions to estimate ATE with observational data

  • Regression adjustment: difference between conditional expectation.
  • Inverse-propensity weighting: to adjust for biases in the treatment assignment. Weighting groups so that control look like treated in terms of distribution of \(X\).

Regression adjustment

We can leverage the assumption of unconfoundedness

\[\begin{align} \tau \equiv \mathbb{E}[\Delta_{i}] & = \mathbb{E}[Y_{i}(1) - Y_{i}(0)] \nonumber \\ & = \mathbb{E}[\mathbb{E}[Y_{i}(1) - Y_{i}(0)] | X_{i}] \ \ \text{(conditional expectation)} \nonumber \\ & = \mathbb{E}[\mathbb{E}[Y_{i}(1) | W_{i} = 1, X_{i}] - \mathbb{E}[\mathbb{E}[Y_{i}(0) | W_{i} = 0, X_{i}] \ \ \text{(Unconfoundedness)} \nonumber \\ & = \mathbb{E}[\mathbb{E}[Y_{i} | W_{i} = 1, X_{i}] - \mathbb{E}[\mathbb{E}[Y_{i} | W_{i} = 0, X_{i}] \ \ \text{(Consistency)} \end{align}\]

where the conditional expectations in the last line can be estimated via regression.

Inverse-propensity weighting

The propensity score is defined as the probability of treatment given observed covariates

\[ e(x) \equiv P(W_{i} = 1 | X_{i} = x) \ \ \forall x \in \mathcal{X} \]

and the overlap assumption gives \(\eta < e(x) < 1-\eta\), \(\forall x \in \mathcal{X}\).

The Inverse-propensity weighting (IPW) estimator is indeed a type of Horvitz-Thompson estimator

\[ \hat{\tau}_{IPW} \equiv \frac{1}{n} \sum_{i=1}^{n} [\frac{W_{i}Y_{i}}{\hat{e}(X_{i})} - \frac{ (1- W_{i})Y_{i}}{1 - \hat{e}(X_{i})} ] \] which weights subjects by the inverse probability of treatment received creates a synthetic sample in which treatment assignment is independent of baseline covariates (i.e., turning observation study into a pseudo-randomized trial by re-weighting samples).