Appendix: Matrix calculus

Appendix: Matrix calculus#

Linear algebra and calculus#

Throughout this class we adopt the following conventions:

Vectors are noted with bold lower case letters and are represented as columns

(134)#\[\begin{align} \mathbf{x} &= \begin{bmatrix} x_1 \\ x_2 \\ \vdots \\ x_n \\ \end{bmatrix} \end{align}\]

Matrix are noted with capital letters

(135)#\[\begin{align} \mathbf{A} &= \begin{bmatrix} a_{11} & a_{12} & \cdots & a_{1p}\\ a_{21} & a_{22} & \cdots & a_{2p}\\ \vdots & \vdots & & \vdots\\ a_{n1} & a_{n2} & \cdots & a_{np}\\ \end{bmatrix} \end{align}\]

The scalar product between two vectors \(\mathbf{x}\) and \(\mathbf{y}\) is given by

(136)#\[\begin{equation} \mathbf{x}\cdot \mathbf{y} = \mathbf{x}^\top \mathbf{y} = \sum_i x_i y_i \end{equation}\]

The matrix vector multiplication is

(137)#\[\begin{align} \begin{bmatrix} \mathbf{A}\mathbf{x} \end{bmatrix}_i = \sum_{j} A_{ij} x_j \end{align}\]

We then adopt the “Jacobian” convention or Numerator layout for wich the gradient of a scalar function \(f(\mathbf{x})\) is a row vector.

(138)#\[\begin{align} \frac{\partial f}{\partial \mathbf{x}} &= \nabla_\mathbf{x} f &= \begin{bmatrix} \frac{\partial f}{\partial x_1} & \frac{\partial f}{\partial x_2} & \cdots & \frac{\partial f}{\partial x_n} \end{bmatrix}. \end{align}\]

With this convention, partial derivative of vectors with respect to scalars are column vectors:

(139)#\[\begin{align} \frac{\partial \mathbf{x}}{\partial y} &= \begin{bmatrix} \frac{\partial x_1}{\partial y} \\ \frac{\partial x_2}{\partial y} \\ \cdots \\ \frac{\partial x_n}{\partial y} \end{bmatrix}, \end{align}\]

and vector derivative of vectors are the Jacobian matrix

(140)#\[\begin{align} \frac{\partial \mathbf{y}}{\partial \mathbf{x}} &= \begin{bmatrix} \frac{\partial y_1}{\partial x_1} & \frac{\partial y_1}{\partial x_2} & \cdots & \frac{\partial y_1}{\partial x_n} \\ \frac{\partial y_2}{\partial x_1} & \frac{\partial y_2}{\partial x_2} & \cdots & \frac{\partial y_2}{\partial x_n} \\ \vdots & \vdots & \ddots & \vdots \\ \frac{\partial y_p}{\partial x_1} & \frac{\partial y_p}{\partial x_2} & \cdots & \frac{\partial y_p}{\partial x_n}\\ \end{bmatrix}. \end{align}\]

Such that the \((i,j)\)th element is

(141)#\[\begin{align} \begin{bmatrix} \frac{\partial \mathbf{y}}{\partial \mathbf{x}} \end{bmatrix}_{ij} = \frac{\partial y_i}{\partial x_j} \end{align}\]

With these conventions, we recall the following rules (with \(\mathbf{A}\) and \(\mathbf{a}\) not a function of \(\mathbf{x}\), and \(\mathbf{u}\), \(\mathbf{v}\) funtions of \(\mathbf{x}\)):

(142)#\[\begin{equation} \frac{\partial \mathbf{A}\mathbf{x}}{\partial \mathbf{x}} = \mathbf{A} \end{equation}\]

Product:

(143)#\[\begin{equation} \frac{\partial \mathbf{u}^\top \mathbf{v}}{\partial \mathbf{x}} = \mathbf{u}^\top \frac{\partial \mathbf{v}}{\partial \mathbf{x}} + \mathbf{v}^\top\frac{\partial \mathbf{u}}{\partial \mathbf{x}} \end{equation}\]

The chain rule

(144)#\[\begin{equation} \frac{\partial \mathbf{f(g(x))}}{\partial \mathbf{x}} = \frac{\partial \mathbf{f(g)}}{\partial \mathbf{g}}\frac{\partial \mathbf{g(x)}}{\partial \mathbf{x}} \end{equation}\]

Note that the order in which the operators appear matters for matrix multiplication.

Question: Can you derive the following two formulas?

(145)#\[\begin{equation} \frac{\partial \mathbf{a}u}{\partial \mathbf{x}} = ? \end{equation}\]

(146)#\[\begin{equation} \frac{\partial \mathbf{A}\mathbf{u}}{\partial \mathbf{x}} = ? \end{equation}\]

You can check your answers here and here (Chap 5)