Linear algebra and calculus
Throughout this class we adopt the following conventions:
(135)\[\begin{align}
\mathbf{x} &= \begin{bmatrix}
x_1 \\
x_2 \\
\vdots \\
x_n \\
\end{bmatrix}
\end{align}\]
(136)\[\begin{align}
\mathbf{A} &= \begin{bmatrix}
a_{11} & a_{12} & \cdots & a_{1p}\\
a_{21} & a_{22} & \cdots & a_{2p}\\
\vdots & \vdots & & \vdots\\
a_{n1} & a_{n2} & \cdots & a_{np}\\
\end{bmatrix}
\end{align}\]
The scalar product between two vectors \(\mathbf{x}\) and \(\mathbf{y}\) is given by
(137)\[\begin{equation}
\mathbf{x}\cdot \mathbf{y} = \mathbf{x}^\top \mathbf{y} = \sum_i x_i y_i
\end{equation}\]
The matrix vector multiplication is
(138)\[\begin{align}
\begin{bmatrix}
\mathbf{A}\mathbf{x}
\end{bmatrix}_i = \sum_{j} A_{ij} x_j
\end{align}\]
We then adopt the “Jacobian” convention or Numerator layout for wich the gradient of a scalar function \(f(\mathbf{x})\) is a row vector.
(139)\[\begin{align}
\frac{\partial f}{\partial \mathbf{x}} &= \nabla_\mathbf{x} f &= \begin{bmatrix}
\frac{\partial f}{\partial x_1} &
\frac{\partial f}{\partial x_2} &
\cdots &
\frac{\partial f}{\partial x_n}
\end{bmatrix}.
\end{align}\]
With this convention, partial derivative of vectors with respect to scalars are column vectors:
(140)\[\begin{align}
\frac{\partial \mathbf{x}}{\partial y} &= \begin{bmatrix}
\frac{\partial x_1}{\partial y} \\
\frac{\partial x_2}{\partial y} \\
\cdots \\
\frac{\partial x_n}{\partial y}
\end{bmatrix},
\end{align}\]
and vector derivative of vectors are the Jacobian matrix
(141)\[\begin{align}
\frac{\partial \mathbf{y}}{\partial \mathbf{x}} &= \begin{bmatrix}
\frac{\partial y_1}{\partial x_1} & \frac{\partial y_1}{\partial x_2} & \cdots & \frac{\partial y_1}{\partial x_n} \\
\frac{\partial y_2}{\partial x_1} & \frac{\partial y_2}{\partial x_2} & \cdots & \frac{\partial y_2}{\partial x_n} \\
\vdots & \vdots & \ddots & \vdots \\
\frac{\partial y_p}{\partial x_1} & \frac{\partial y_p}{\partial x_2} & \cdots & \frac{\partial y_p}{\partial x_n}\\
\end{bmatrix}.
\end{align}\]
Such that the \((i,j)\)th element is
(142)\[\begin{align}
\begin{bmatrix}
\frac{\partial \mathbf{y}}{\partial \mathbf{x}}
\end{bmatrix}_{ij} = \frac{\partial y_i}{\partial x_j}
\end{align}\]
With these conventions, we recall the following rules (with \(\mathbf{A}\) and \(\mathbf{a}\) not a function of \(\mathbf{x}\), and \(\mathbf{u}\), \(\mathbf{v}\) funtions of \(\mathbf{x}\)):
(143)\[\begin{equation}
\frac{\partial \mathbf{A}\mathbf{x}}{\partial \mathbf{x}} = \mathbf{A}
\end{equation}\]
Product:
(144)\[\begin{equation}
\frac{\partial \mathbf{u}^\top \mathbf{v}}{\partial \mathbf{x}} = \mathbf{u}^\top \frac{\partial \mathbf{v}}{\partial \mathbf{x}} + \mathbf{v}^\top\frac{\partial \mathbf{u}}{\partial \mathbf{x}}
\end{equation}\]
The chain rule
(145)\[\begin{equation}
\frac{\partial \mathbf{f(g(x))}}{\partial \mathbf{x}} = \frac{\partial \mathbf{f(g)}}{\partial \mathbf{g}}\frac{\partial \mathbf{g(x)}}{\partial \mathbf{x}}
\end{equation}\]
Note that the order in which the operators appear matters for matrix multiplication.
Question: Can you derive the following two formulas?
You can check your answers here and here (Chap 5)