Appendix: Matrix calculus#


Linear algebra and calculus#

Throughout this class we adopt the following conventions:

  • Vectors are noted with bold lower case letters and are represented as columns

(134)#\[\begin{align} \mathbf{x} &= \begin{bmatrix} x_1 \\ x_2 \\ \vdots \\ x_n \\ \end{bmatrix} \end{align}\]
  • Matrix are noted with capital letters

(135)#\[\begin{align} \mathbf{A} &= \begin{bmatrix} a_{11} & a_{12} & \cdots & a_{1p}\\ a_{21} & a_{22} & \cdots & a_{2p}\\ \vdots & \vdots & & \vdots\\ a_{n1} & a_{n2} & \cdots & a_{np}\\ \end{bmatrix} \end{align}\]

The scalar product between two vectors \(\mathbf{x}\) and \(\mathbf{y}\) is given by

(136)#\[\begin{equation} \mathbf{x}\cdot \mathbf{y} = \mathbf{x}^\top \mathbf{y} = \sum_i x_i y_i \end{equation}\]

The matrix vector multiplication is

(137)#\[\begin{align} \begin{bmatrix} \mathbf{A}\mathbf{x} \end{bmatrix}_i = \sum_{j} A_{ij} x_j \end{align}\]

We then adopt the “Jacobian” convention or Numerator layout for wich the gradient of a scalar function \(f(\mathbf{x})\) is a row vector.

(138)#\[\begin{align} \frac{\partial f}{\partial \mathbf{x}} &= \nabla_\mathbf{x} f &= \begin{bmatrix} \frac{\partial f}{\partial x_1} & \frac{\partial f}{\partial x_2} & \cdots & \frac{\partial f}{\partial x_n} \end{bmatrix}. \end{align}\]

With this convention, partial derivative of vectors with respect to scalars are column vectors:

(139)#\[\begin{align} \frac{\partial \mathbf{x}}{\partial y} &= \begin{bmatrix} \frac{\partial x_1}{\partial y} \\ \frac{\partial x_2}{\partial y} \\ \cdots \\ \frac{\partial x_n}{\partial y} \end{bmatrix}, \end{align}\]

and vector derivative of vectors are the Jacobian matrix

(140)#\[\begin{align} \frac{\partial \mathbf{y}}{\partial \mathbf{x}} &= \begin{bmatrix} \frac{\partial y_1}{\partial x_1} & \frac{\partial y_1}{\partial x_2} & \cdots & \frac{\partial y_1}{\partial x_n} \\ \frac{\partial y_2}{\partial x_1} & \frac{\partial y_2}{\partial x_2} & \cdots & \frac{\partial y_2}{\partial x_n} \\ \vdots & \vdots & \ddots & \vdots \\ \frac{\partial y_p}{\partial x_1} & \frac{\partial y_p}{\partial x_2} & \cdots & \frac{\partial y_p}{\partial x_n}\\ \end{bmatrix}. \end{align}\]

Such that the \((i,j)\)th element is

(141)#\[\begin{align} \begin{bmatrix} \frac{\partial \mathbf{y}}{\partial \mathbf{x}} \end{bmatrix}_{ij} = \frac{\partial y_i}{\partial x_j} \end{align}\]

With these conventions, we recall the following rules (with \(\mathbf{A}\) and \(\mathbf{a}\) not a function of \(\mathbf{x}\), and \(\mathbf{u}\), \(\mathbf{v}\) funtions of \(\mathbf{x}\)):

(142)#\[\begin{equation} \frac{\partial \mathbf{A}\mathbf{x}}{\partial \mathbf{x}} = \mathbf{A} \end{equation}\]


(143)#\[\begin{equation} \frac{\partial \mathbf{u}^\top \mathbf{v}}{\partial \mathbf{x}} = \mathbf{u}^\top \frac{\partial \mathbf{v}}{\partial \mathbf{x}} + \mathbf{v}^\top\frac{\partial \mathbf{u}}{\partial \mathbf{x}} \end{equation}\]

The chain rule

(144)#\[\begin{equation} \frac{\partial \mathbf{f(g(x))}}{\partial \mathbf{x}} = \frac{\partial \mathbf{f(g)}}{\partial \mathbf{g}}\frac{\partial \mathbf{g(x)}}{\partial \mathbf{x}} \end{equation}\]

Note that the order in which the operators appear matters for matrix multiplication.

Question: Can you derive the following two formulas?

  • (145)#\[\begin{equation} \frac{\partial \mathbf{a}u}{\partial \mathbf{x}} = ? \end{equation}\]
  • (146)#\[\begin{equation} \frac{\partial \mathbf{A}\mathbf{u}}{\partial \mathbf{x}} = ? \end{equation}\]

You can check your answers here and here (Chap 5)


Contributors include Bruno Deremble and Alexis Tantet.

Logo LMD Logo IPSL Logo E4C Logo EP Logo SU Logo ENS Logo CNRS