#### Images Are Matrices #

A digital image is nothing else but a matrix; each pixel is an entry of the matrix. For example, we can represent the following 20-day OHLC Image from Jiang, Kelly and Xiu (2022+)

by a 64x60 matrix given by

where 255 means a white pixel and 0 means a black one.

In general, we can represent any $\mathrm{I}\times \mathrm{J}$ grayscale image as a matrix, say

$$\boldsymbol{V}=\{V_{\mathrm{i},\mathrm{j}}:\mathrm{i}=1,\ldots,\mathrm{I}, ~\mathrm{j}=1,\ldots,\mathrm{J}\}$$

where $V_{\mathrm{i},\mathrm{j}}$ indicates the grayscale of a pixel at the location $(\mathrm{i},\mathrm{j})$.

#### Flattening #

One can convert a matrix into a vector or vice versa (given the size of the matrices). This process of converting multiple grids into a vector is called *flattening* in machine learning or
*vectorization* in mathematics.

Vectorizing $\boldsymbol{V}$ yields a $\mathrm{I}\times \mathrm{J}$ dimensional vector

$$\operatorname{vec}(\boldsymbol{V})=\begin{pmatrix}V_{1,1}\\\vdots\\V_{\mathrm{I},1}\\V_{1,2}\\\vdots\\V_{\mathrm{I},2}\\\vdots\\V_{1,\mathrm{J}}\\\vdots\\V_{\mathrm{I},\mathrm{J}}\end{pmatrix}.$$

We may input the flattened feature vector directly to a neural network by treating every entry as a feature for prediction. This approach is, however, *not* favorable because it *ignores* the **spatial structure** of the original matrices. The location information of the pixels are lost during the flattening process. To exploit the spatial information, we should use
convolutional neural networks.