Causal Inference
Treatment Effect
A Motivating Example # Consider the following example from Athey (2019): $\ldots$ Imagine that you have a data set that contains data about prices and occupancy rates of hotels. Prices are easy to obtain through price comparison sites, but occupancy rates are typically not made public by hotels. Imagine first that a hotel chain wishes to form an estimate of the occupancy rates of competitors, based on publicly available prices. ...Learn More>
Regression Methods
Under the unconfoundedness assumption (UA), \begin{align*}\mu^{(1)}(x)=&~\mathbb{E}[Y^{(1)}|X=x]\\\stackrel{\text{UA}}{=}&~\mathbb{E}[Y^{(1)}|X=x,D=1]\\=&~\mathbb{E}[Y|X=x, D=1]&\end{align*} and, similarly, \begin{align*}\mu^{(0)}(x)=&~\mathbb{E}[Y^{(0)}|X=x]\\\stackrel{\text{UA}}{=}&~\mathbb{E}[Y|X=x, D=0].\end{align*} Hence, we can divide the full sample into two subsamples: one for the treatment group and the other one for the control group given by, respectively, $$S_1=\{Y_i,X_i: D_i=1\},\\ S_0=\{Y_i,X_i: D_i=0\}.$$ Estimating the regression functions for the two groups separately, we obtain the estimators $\widehat{\mu}^{(1)}(x)$ and $\widehat{\mu}^{(0)}(x)$ using machine learning techniques (such as deep neural networks, least-squares boosting, or random forest). ...Learn More>
Classification Methods
Propensity Scores # A conditional propensity score is the conditional probability of an individual for being assigned to the treatment group as a function of the confounders, namely, \begin{align*}p(x)=&\mathbb{E}[D|X=x]\\=&\mathbb{P}[D=1|X=x].\end{align*} Under the unconfoundedness assumption, $$\mathbb{E}[DY|X=x]=p(x)\cdot \mu^{(1)}(x)$$ and, similarly, $$\mathbb{E}[(1-D)Y|X=x]=(1-p(x))\cdot \mu^{(0)}(x).$$ Define the estimand functions $$\phi_1(y,t,x)=\frac{t}{p(x)}\cdot y$$ and $$\phi_0(y,t,x)=\frac{1-t}{1-p(x)}\cdot y.$$ It follows that $$\mathbb{E}[\phi_t(Y,D,X)|X=x]=\mu^{(t)}(x),~t\in\{0,1\}.$$ Then the CATE equals to \begin{align*}\tau(x)=&\mu^{(1)}(x)-\mu^{(0)}(x)\\=&\mathbb{E}[\phi_1(Y,D,X)|X=x]\\&-\mathbb{E}[\phi_0(Y,D,X)|X=x].&\end{align*} Using the law of iterated expectations, the ATE equals to \begin{align*}\tau=&\mathbb{E}[\tau(X)]\\=&\mathbb{E}[\phi_1(Y,D,X)]-\mathbb{E}[\phi_0(Y,D,X)].&\end{align*} ...Learn More>
Regression + Classifiction Methods
Influence Functions # The regression methods only compare the outcomes from different groups, while the classification methods focus on treatment assignments. They emphasize two different sources of information in the potential outcome model. One may want to integrate both the outcome and treatment information to improve the estimation efficiency. First let us introduce the (uncentered) influence functions given by \begin{align*}&\psi_1(y,t,x)\\=&\phi_1(y,t,x)+\left(\frac{t}{p(x)}-1\right)\cdot \mu^{(1)}(x)\\=&\frac{t}{p(x)}\cdot y+\left(\frac{t}{p(x)}-1\right)\cdot \mathbb{E}[Y^{(1)}|X=x]\end{align*} and, similarly, \begin{align*}&\psi_0(y,t,x)\\=&\phi_0(y,t,x)+\left(\frac{1-t}{1-p(x)}-1\right)\cdot \mu^{(0)}(x)\\=&\frac{1-t}{1-p(x)}\cdot y+\left(\frac{1-t}{1-p(x)}-1\right)\cdot \mathbb{E}[Y^{(0)}|X=x].\end{align*} ...Learn More>
Selecting Confounders
Structral Equations # Suppose that the invididual treatment effect is a fixed constant given by $$\delta_i=Y_i^{(1)}-Y_i^{(0)}\equiv\delta.$$ Note that this is also the ATE $\tau=\delta$ and the CATE $\tau(x)\equiv\delta$. Moreover, assume that the potential outcome in the control group $Y_i^{(0)}$ obeys a linear regression model on confounders $X_i=(X_{i1},\ldots, X_{id})$ given by $$Y_i^{(0)}=\sum_{j=1}^{d}X_{ij}\beta_j+\varepsilon_i,$$ where we omit the intercept for simplicity and $\varepsilon_i$ are zero-mean errors independent of the confounders $X_i$ and treatment dummy $D_i$. ...Learn More>
Debiased Machine Learning
Consider the structural equations again but allowing the potential outcomes to be non-linear in confounders such that $$Y_i=\delta D_i+ \mu^{(0)}(X_i)+\varepsilon_i,$$ where $\mu^{(0)}$ is the regression function for the control group given by $$\mu^{(0)}(x)=\mathbb{E}[Y_i^{(0)}|X=x].$$ Learning Bias # Suppose that we have a machine learning estimator $\widehat{\mu}^{(0)}$ from an auxiliary sample (e.g., from another country) independent of our observations such that $$\mathbb{E}b(X_i)=0$$ but with a nonzero conditional bias given by b(x)=\mathbb{E}[\widehat{\mu}^{(0)}(x)]-\mu^{(0)}(x)\neq 0,~x\in\mathcal{X}. ...Learn More>