Random Forest

Unstable Estimator: Consider an estimator $\widetilde{f}(x)$ of some ideal prediction rule $f^*(x)$. Intuitively, the data-driven prediction rule $\widetilde{f}(x)$ is unstable if small changes in the data can cause large changes in the predicted value(s). A Toy Example # Consider the estimator $$\widetilde{f}(x;\bar{X})=\mathbf{1}[\bar{X}\leq x]$$ where $\bar{X}$ is a sample mean of normal features such that $$\bar{X}=\frac{1}{n}\sum_{i=1}^{n}X_i,\quad X_i\sim N(\mu,1)$$ with an unknown mean $\mu\in\mathbb{R}$. Suppose that we observe that $\bar{X}=0.5$ and so $\widetilde{f}(0.5;0.5)=1$. If there is small change in $\bar{X}$, say, $$\bar{X}=0. ...Learn More>
Bagging: Bagging Reduces Estimation Variance # Bootstrap aggregating (or, in short, bagging) is a computationally intensive method for reducing the variance of unstable estimators. The idea is to aggregate the predictions by a committee of unstable estimators rather than using a single one: By bootstrapping we mean drawing $n^*$ data points randomly from the original database $$S=\{Y_i,X_i: i=1,\ldots, n\}.$$ We draw with replacement, allowing the same data points to occur more than once in the resampled dataset. ...Learn More>
Decision Tree: Decision Tree # A decision tree grows from a root representing a YES/NO condition. The root branches into two child nodes for two different states. We travel from the root to its left child node when the condition is true or otherwise to its right child. Each child node could be a leaf (also called terminal node) or the root of a subtree. For the latter case, we continue growing and traveling until we reach a terminal. ...Learn More>
Random Forest: Subagging with Trees # Like for our toy example of the unstable estimator, instability also arises in decision trees grown by CART because of the ‘hard’ decisions with indicators made by the splits. To improve stability, one can smooth the prediction by aggregating unstable CART trees predictions using bagging. Regarding computational costs (and sometimes technical reasons), one might want to draw data points without replacement from the original data. This means we must draw fewer data points with $n^{*}<n$; otherwise, we duplicate the sets in the resampling procedure, and no smoothing effect can emerge. ...Learn More>