Under the unconfoundedness assumption (UA), $$\begin{align*}\mu^{(1)}(x)=&~\mathbb{E}[Y^{(1)}|X=x]\\\stackrel{\text{UA}}{=}&~\mathbb{E}[Y^{(1)}|X=x,D=1]\\=&~\mathbb{E}[Y|X=x, D=1]&\end{align*}$$ and, similarly, $$\begin{align*}\mu^{(0)}(x)=&~\mathbb{E}[Y^{(0)}|X=x]\\\stackrel{\text{UA}}{=}&~\mathbb{E}[Y|X=x, D=0].\end{align*}$$
Hence, we can divide the full sample into two subsamples: one for the treatment group and the other one for the control group given by, respectively,
$$S_1=\{Y_i,X_i: D_i=1\},\\ S_0=\{Y_i,X_i: D_i=0\}.$$
Estimating the regression functions for the two groups separately, we obtain the estimators $\widehat{\mu}^{(1)}(x)$ and $\widehat{\mu}^{(0)}(x)$ using machine learning techniques (such as deep neural networks, least-squares boosting, or random forest). Then we can estimate the CATE by $$\widehat{\tau}(x)=\widehat{\mu}^{(1)}(x)-\widehat{\mu}^{(0)}(x).$$
As far as the overlap condition holds, one may estimate the ATE by $$\widehat{\tau}=\frac{1}{n}\sum_{i=1}^{n}\widehat{\tau}(X_i).$$
Causal Forest #
If we estimate the group-wise regression functions by random forests, we obtain the causal forest estimator $\widehat{\tau}(x)$ developed by Wager and Athey (2016). They suggest to generate honest trees) in each forest and use small-order subagging) to avoid bias issues in the inference.