Deep Learning
Deep Neural Networks
Deep Neural Networks # We can construct feedforward neural networks $f(x):\mathbb{R}^d\rightarrow\mathbb{R}^K$ with more hidden layers as follows. Denote the input of size $M_0=d$ to the neural net by $$z^{(0)}=\begin{pmatrix}x_1\\\vdots\\x_d\end{pmatrix}\in\mathbb{R}^{d}.$$ We compute $M_1$ activations making up the first hidden layer by $$z^{(1)}=\sigma(a^{(1)}),\quad a^{(1)}=W^{(1)}z^{(0)}+b^{(1)}$$ where $W^{(1)}\in\mathbb{R}^{M_1\times M_0}$ is a matrix of weights and $b^{(1)}\in\mathbb{R}^{M_1}$ is a vector of biases. We apply the activation function $\sigma$ element-wisely to the vector $a^{(1)}$. ...Learn More>
Designing the Architecture
Geometric Pyramid Rule # The architecture of a neural network is determined by not only its depth but also the number of units in each layer. The example earlier (link here) uses an architecture with depth $D=3$, and number of hidden units $M_1=3$ and $M_2=4$ in the first and second hidden layer respectively. These choices are somewhat arbitrary. A more systematic approach proposed in Masters (1993) is the geometric pyramid rule, which keeps reducing the number of hidden units by half when adding a new layer. ...Learn More>
Why ReLU Function
In deep learning, the ReLU activation function $\sigma(x)=\max\{0,x\}$ is much more common than others. Thresholding Effects # One reason is that its derivative $\sigma'(x)=\mathbf{1}[x>0]$ is binary, using only the signals beyond a certain threshold. Strictly speaking, $\operatorname{ReLU}$ is not differentiable at the origin, but we can set $\sigma'(0)=0$ artificially in the gradient descent algorithms. The zero derivatives generate similar effects as dropout by shutting down some neurons - but not randomly - in the gradient calculation. ...Learn More>
Image Data
Deep learning is an important technology for image recognition. Here we mention two interesting applications. Deepmind Computer Is World’s Best Go Player # Go is a board game invented in China more than 2500 years ago and it is played to the present day. The following figure shows the first 50 steps in the first match between AlphaGo and the world’s number one human player Ke Jie in May 2017. ...Learn More>
Spatial Information
Images Are Matrices # A digital image is nothing else but a matrix; each pixel is an entry of the matrix. For example, we can represent the following 20-day OHLC Image from Jiang, Kelly and Xiu (2022+) by a 64x60 matrix given by where 255 means a white pixel and 0 means a black one. In general, we can represent any $\mathrm{I}\times \mathrm{J}$ grayscale image as a matrix, say ...Learn More>
Convolutional Neural Network
Architecture of a Traditional CNN # A convolutional neural network is composed of at least 3 layers: A convolution layer to perform convolution operations and to generate many feature maps from one image; A pooling layer to denoise the feature maps by shrinking non-overlapping submatrices into summary statistics (such as maximums); A dense layer which is a usual (shallow/deep) neural network that takes flattened inputs. In general, one may create different combinations of the convolution and pooling layers. ...Learn More>