Statistical Learning Framework
The Ideal Prediction Rule
We are interested in forecasting some target variable $Y$ using $d$ features $X_1$,$\ldots$,$X_d$. Let $Y$ belong to a label set $\mathcal{Y}$, and the vector of features $X=(X_1,\ldots,X_d)$ belong to a domain set $\mathcal{X}$. Examples of label sets include: A set of only two elements such as $\{0,1\}$ or $\{-1,+1\}$. The binary labels usually represent a YES/NO decision. A bounded interval on the real line that contains infinitely many elements, such as $[-M,M]$ for some $M>0$. ...Learn More>
Empirical Risk Minimization
The fundamental problem in machine learning is that the population joint distribution $\mathcal{D}$ of the target and features is unknown. Instead, we are given only a set of data $$S=\left\lbrace Y_i,X_i: i=1,\ldots,n\right\rbrace.$$ We can compute the losses $\ell(f(X_i),Y_i)$ over all observations $Y_i,X_i$, where $f$ denotes any candidate prediction rule of interest. We call the average loss over the sample $S$ the empirical risk function. Empirical Risk Function # L_S(f)=\frac{1}{n}\sum_{i=1}^{n}\ell(f(X_i),Y_i),\quad f\in\mathcal{F}. ...Learn More>