{
"cells": [
{
"cell_type": "markdown",
"id": "5eb8c31b",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"# Appendix: Matrix calculus\n",
"\n",
"[](https://mybinder.org/v2/git/https%3A%2F%2Fgitlab.in2p3.fr%2Fenergy4climate%2Fpublic%2Feducation%2Fmachine_learning_for_climate_and_energy/master?filepath=book%2Fnotebooks%2F1_introduction.ipynb)"
]
},
{
"cell_type": "markdown",
"id": "8632bbd3",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"## Linear algebra and calculus"
]
},
{
"cell_type": "markdown",
"id": "0b5d2c9b",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"Throughout this class we adopt the following conventions:\n",
"\n",
"- Vectors are noted with bold lower case letters and are represented as columns\n",
"\n",
"\\begin{align}\n",
" \\mathbf{x} &= \\begin{bmatrix}\n",
" x_1 \\\\\n",
" x_2 \\\\\n",
" \\vdots \\\\\n",
" x_n \\\\\n",
" \\end{bmatrix}\n",
"\\end{align}"
]
},
{
"cell_type": "markdown",
"id": "6227bfea",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"- Matrix are noted with capital letters\n",
"\n",
"\\begin{align}\n",
" \\mathbf{A} &= \\begin{bmatrix}\n",
" a_{11} & a_{12} & \\cdots & a_{1p}\\\\\n",
" a_{21} & a_{22} & \\cdots & a_{2p}\\\\\n",
" \\vdots & \\vdots & & \\vdots\\\\\n",
" a_{n1} & a_{n2} & \\cdots & a_{np}\\\\\n",
" \\end{bmatrix}\n",
"\\end{align}"
]
},
{
"cell_type": "markdown",
"id": "91917904",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"The scalar product between two vectors $\\mathbf{x}$ and $\\mathbf{y}$ is given by\n",
"\\begin{equation}\n",
"\\mathbf{x}\\cdot \\mathbf{y} = \\mathbf{x}^\\top \\mathbf{y} = \\sum_i x_i y_i\n",
"\\end{equation}\n",
"\n",
"The matrix vector multiplication is\n",
"\\begin{align}\n",
"\\begin{bmatrix}\n",
"\\mathbf{A}\\mathbf{x}\n",
"\\end{bmatrix}_i = \\sum_{j} A_{ij} x_j\n",
"\\end{align}\n",
"\n",
"We then adopt the \"Jacobian\" convention or [Numerator layout](https://en.wikipedia.org/wiki/Matrix_calculus#Layout_conventions) for wich the gradient of a scalar function $f(\\mathbf{x})$ is a row vector. \n",
"\n",
"\\begin{align}\n",
" \\frac{\\partial f}{\\partial \\mathbf{x}} &= \\nabla_\\mathbf{x} f &= \\begin{bmatrix}\n",
" \\frac{\\partial f}{\\partial x_1} &\n",
" \\frac{\\partial f}{\\partial x_2} &\n",
" \\cdots &\n",
" \\frac{\\partial f}{\\partial x_n}\n",
" \\end{bmatrix}.\n",
"\\end{align}"
]
},
{
"cell_type": "markdown",
"id": "1dfc749d",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"With this convention, partial derivative of vectors with respect to scalars are column vectors:\n",
"\\begin{align}\n",
" \\frac{\\partial \\mathbf{x}}{\\partial y} &= \\begin{bmatrix}\n",
" \\frac{\\partial x_1}{\\partial y} \\\\\n",
" \\frac{\\partial x_2}{\\partial y} \\\\\n",
" \\cdots \\\\\n",
" \\frac{\\partial x_n}{\\partial y}\n",
" \\end{bmatrix},\n",
"\\end{align}"
]
},
{
"cell_type": "markdown",
"id": "5bae8e8f",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"and vector derivative of vectors are the Jacobian matrix\n",
"\n",
"\\begin{align}\n",
" \\frac{\\partial \\mathbf{y}}{\\partial \\mathbf{x}} &= \\begin{bmatrix}\n",
" \\frac{\\partial y_1}{\\partial x_1} & \\frac{\\partial y_1}{\\partial x_2} & \\cdots & \\frac{\\partial y_1}{\\partial x_n} \\\\\n",
" \\frac{\\partial y_2}{\\partial x_1} & \\frac{\\partial y_2}{\\partial x_2} & \\cdots & \\frac{\\partial y_2}{\\partial x_n} \\\\\n",
" \\vdots & \\vdots & \\ddots & \\vdots \\\\\n",
" \\frac{\\partial y_p}{\\partial x_1} & \\frac{\\partial y_p}{\\partial x_2} & \\cdots & \\frac{\\partial y_p}{\\partial x_n}\\\\\n",
" \\end{bmatrix}.\n",
"\\end{align}\n",
"\n",
"Such that the $(i,j)$th element is\n",
"\\begin{align}\n",
"\\begin{bmatrix}\n",
"\\frac{\\partial \\mathbf{y}}{\\partial \\mathbf{x}} \n",
"\\end{bmatrix}_{ij} = \\frac{\\partial y_i}{\\partial x_j} \n",
"\\end{align}"
]
},
{
"cell_type": "markdown",
"id": "6b72df4e",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"With these conventions, we recall the following rules (with $\\mathbf{A}$ and $\\mathbf{a}$ not a function of $\\mathbf{x}$, and $\\mathbf{u}$, $\\mathbf{v}$ funtions of $\\mathbf{x}$):\n",
"\n",
"\n",
"\\begin{equation}\n",
"\\frac{\\partial \\mathbf{A}\\mathbf{x}}{\\partial \\mathbf{x}} = \\mathbf{A}\n",
"\\end{equation}\n",
"\n",
"\n",
"\n",
"Product:\n",
"\n",
"\\begin{equation}\n",
"\\frac{\\partial \\mathbf{u}^\\top \\mathbf{v}}{\\partial \\mathbf{x}} = \\mathbf{u}^\\top \\frac{\\partial \\mathbf{v}}{\\partial \\mathbf{x}} + \\mathbf{v}^\\top\\frac{\\partial \\mathbf{u}}{\\partial \\mathbf{x}}\n",
"\\end{equation}\n",
"\n",
"The chain rule \n",
"\n",
"\\begin{equation}\n",
"\\frac{\\partial \\mathbf{f(g(x))}}{\\partial \\mathbf{x}} = \\frac{\\partial \\mathbf{f(g)}}{\\partial \\mathbf{g}}\\frac{\\partial \\mathbf{g(x)}}{\\partial \\mathbf{x}} \n",
"\\end{equation}\n",
"\n",
"*Note that the order in which the operators appear matters for matrix multiplication*."
]
},
{
"cell_type": "markdown",
"id": "a4d70e94",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"> ***Question***: Can you derive the following two formulas?\n",
"> - \\begin{equation}\n",
"\\frac{\\partial \\mathbf{a}u}{\\partial \\mathbf{x}} = ?\n",
"\\end{equation}\n",
"> - \\begin{equation}\n",
"\\frac{\\partial \\mathbf{A}\\mathbf{u}}{\\partial \\mathbf{x}} = ?\n",
"\\end{equation}\n",
"\n",
"You can check your answers [here](https://en.wikipedia.org/wiki/Matrix_calculus) and [here (Chap 5)](https://mml-book.github.io/book/mml-book.pdf)"
]
},
{
"cell_type": "markdown",
"id": "5e186998",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"***\n",
"## Credit\n",
"\n",
"[//]: # \"This notebook is part of [E4C Interdisciplinary Center - Education](https://gitlab.in2p3.fr/energy4climate/public/education).\"\n",
"Contributors include Bruno Deremble and Alexis Tantet.\n",
"\n",
"
\n",
"\n",
"