Classification Methods

#### Propensity Scores #

A conditional propensity score is the conditional probability of an individual for being assigned to the treatment group as a function of the confounders, namely, \begin{align*}p(x)=&\mathbb{E}[D|X=x]\\=&\mathbb{P}[D=1|X=x].\end{align*}

Under the unconfoundedness assumption, $$\mathbb{E}[DY|X=x]=p(x)\cdot \mu^{(1)}(x)$$ and, similarly, $$\mathbb{E}[(1-D)Y|X=x]=(1-p(x))\cdot \mu^{(0)}(x).$$

Define the estimand functions $$\phi_1(y,t,x)=\frac{t}{p(x)}\cdot y$$ and $$\phi_0(y,t,x)=\frac{1-t}{1-p(x)}\cdot y.$$ It follows that $$\mathbb{E}[\phi_t(Y,D,X)|X=x]=\mu^{(t)}(x),~t\in\{0,1\}.$$ Then the CATE equals to \begin{align*}\tau(x)=&\mu^{(1)}(x)-\mu^{(0)}(x)\\=&\mathbb{E}[\phi_1(Y,D,X)|X=x]\\&-\mathbb{E}[\phi_0(Y,D,X)|X=x].&\end{align*} Using the law of iterated expectations, the ATE equals to \begin{align*}\tau=&\mathbb{E}[\tau(X)]\\=&\mathbb{E}[\phi_1(Y,D,X)]-\mathbb{E}[\phi_0(Y,D,X)].&\end{align*}

#### Estimating Propensity Scores and the ATE #

One may use the Adaboost algorithm, for instance, to obtain an estimator of the log odds, say, $\widehat{a}(x)$ and then estimate the posterior probability $p(x)$ via the sigmoid transformation by $$\widehat{p}(x)=\frac{1}{1+\exp(-\widehat{a}(x))}.$$ Alternatively, we can also fit a deep neural network or a random forest to estimate the posterior probability $p(x)$.

Given any estimator $\widehat{p}(x)$ of $p(x)$, we can estimate the functions $\phi_1$ and $\phi_0$ respectively by $$\widehat{\phi}_1(y,t,x)=\frac{t}{\widehat{p}(x)}\cdot y$$ and $$\widehat{\phi}_0(y,t,x)=\frac{1-t}{1-\widehat{p}(x)}\cdot y.$$

Then we can estimate the ATE via the sample analogy given by \begin{align*}\widehat{\tau}=&\frac{1}{n}\sum_{i=1}^{n}\widehat{\phi}_1(Y_i,D_i,X_i)\\&-\frac{1}{n}\sum_{i=1}^{n}\widehat{\phi}_0(Y_i,D_i,X_i).\end{align*}

Previous Section: Regression Methods
Next Section: Regression + Classifiction Methods