{ "cells": [ { "cell_type": "markdown", "id": "ef7bc077", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "# Ordinary Least Squares\n", "\n", "[![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/git/https%3A%2F%2Fgitlab.in2p3.fr%2Fenergy4climate%2Fpublic%2Feducation%2Fmachine_learning_for_climate_and_energy/master?filepath=book%2Fnotebooks%2F03_ordinary_least_squares.ipynb)" ] }, { "cell_type": "markdown", "id": "ccdfea6e-5bff-4e31-b35a-6ba7a1554a52", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "
\n", " Prerequisites\n", " \n", "- Define a supervised learning problem.\n", "
\n", "\n", "
\n", " Learning Outcomes\n", " \n", "- Apply the supervised learning methodology to a multiple linear regression\n", "
" ] }, { "cell_type": "markdown", "id": "eae3c60c", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "## Strengths of the OLS\n", "\n", "- Simple to use;\n", "- Easily interpretable in terms of variances and covariances;\n", "- Can outperform fancier nonlinear models for prediction, especially in situations with:\n", " - small training samples,\n", " - low signal-to-noise ratio,\n", " - sparse data.\n", "- Expandable to nonlinear transformations of the inputs;\n", "- Can be used as a simple reference to learn about machine learning methodologies (supervised learning, in particular)." ] }, { "cell_type": "markdown", "id": "94862dfd", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "## Linear Model\n", "\n", "\\begin{equation}\n", "f_{\\boldsymbol{\\beta}}(X) = \\underbrace{\\beta_0}_{\\mathrm{intercept}} + \\sum_{j = 1}^p X_j \\beta_j\n", "\\end{equation}" ] }, { "cell_type": "markdown", "id": "ad85fa16", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "$X_j$ can come from :\n", "- quantitative inputs;\n", "- transformations of quantitative inputs, such as log or square;\n", "- basis expansions, such as $X_2 = X_1^2$, $X_3 = X_1^3$;\n", "- interactions between variables, for example, $X_3 = X_1 \\cdot X_2$;\n", "- numeric or \"dummy\" coding of the levels of qualitative inputs. For example, $X_j, j = 1, \\ldots, 5$, such that $X_j = I(G = j)$." ] }, { "cell_type": "markdown", "id": "9b9392d1", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "## Residual Sum of Squares\n", "\n", "The sample-mean estimate of the Expected Training Error with Squared Error Loss gives the *Residual Sum of Squares* (RSS) depending on the parameters:\n", "\n", "\\begin{equation}\n", "\\mathrm{RSS}(\\beta)\n", "= \\sum_{i = 1}^N \\left(y_i - f(x_i)\\right)^2\n", " = \\sum_{i = 1}^N \\left(y_i - \\beta_0 - \\sum_{j = 1}^p x_{ij} \\beta_j\\right)^2.\n", "\\end{equation}\n", "\n", "\"Linear\n", "\"Linear" ] }, { "cell_type": "markdown", "id": "4b8487bb", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "> ***Question***\n", "> - Assume that $f(x) = \\bar{y}$ (the sample mean of the target).\n", "The corresponding RSS is called the Total Sum of Squares (TSS).\n", "How does the TSS relate to the sample variance $s_Y^2$ of $Y$?" ] }, { "cell_type": "markdown", "id": "267951b5", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "The *coefficient of determination* $R^2$ relates to the RSS as such,\n", "\n", "\\begin{equation}\n", "R^2(\\boldsymbol{\\beta}) = 1 - \\frac{\\mathrm{RSS}(\\boldsymbol{\\beta})}{\\mathrm{TSS}}.\n", "\\end{equation}" ] }, { "cell_type": "markdown", "id": "0f429234", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## How to Minimize the RSS?\n", "\n", "Denote by $\\mathbf{X}$ the $N \\times (p + 1)$ input-data matrix.\n", "\n", "The 1st column of $\\mathbf{X}$ is associated with the intercept and is given by the $N$-dimensional vector $\\mathbf{1}$ with all elements equal to 1.\n", "\n", "Then,\n", "\\begin{equation}\n", "\\mathrm{RSS}(\\beta) = \\left(\\mathbf{y} - \\mathbf{X} \\boldsymbol{\\beta}\\right)^\\top \\left(\\mathbf{y} - \\mathbf{X} \\boldsymbol{\\beta}\\right).\n", "\\end{equation}" ] }, { "cell_type": "markdown", "id": "a38d8c9c", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "> ***Question (optional)***\n", "> - Show that the following parameter estimate minimizes the RSS.\n", "> - Show that this solution is unique if and only if $\\mathbf{X}^\\top\\mathbf{X}$ is positive definite.\n", "> - When could this condition not be fulfilled?\n", "\n", "\\begin{equation}\n", " \\hat{\\boldsymbol{\\beta}} = \\left(\\mathbf{X}^\\top \\mathbf{X}\\right)^{-1} \\left(\\mathbf{X}^\\top \\mathbf{y}\\right)\n", "\\end{equation}" ] }, { "cell_type": "markdown", "id": "ce6d258f", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "> ***Question (optional)***\n", "> - Express $R^2(\\hat{\\boldsymbol{\\beta}})$ (above) in terms of explained variance.\n", "> - Show that $R^2(\\hat{\\boldsymbol{\\beta}})$ is invariant under linear transformations the target." ] }, { "cell_type": "markdown", "id": "be577b50", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "
\n", " Remark\n", " \n", "The formula for the optimal coefficients is in closed form, meaning that it can be directly computed using a finite number of standard operations.\n", " \n", "Nonlinear models (e.g. neural networks) will instead require solving numerical problems iteratively with a finite precision.\n", "
" ] }, { "cell_type": "markdown", "id": "397f6d19", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "Suppose that the inputs $\\mathbf{x}_1, \\ldots, \\mathbf{x}_p$ (the columns of the data matrix $\\mathbf{X}$) are orthogonal; that is $\\mathbf{x}_j^\\top \\mathbf{x}_k = 0$ for all $j \\ne k$.\n", "\n", "> ***Question***\n", "> - Show that $\\hat{\\beta} = \\mathbf{x}_j^\\top \\mathbf{y} / (\\mathbf{x}_j^\\top \\mathbf{x}_j)$ for all $j$.\n", "> - Interpret these coefficients in terms of correlations and variances.\n", "> - How do the inputs influence each other's parameter estimates in the model?\n", "> - Find a simple expression of $R^2(\\hat{\\beta})$ in that case." ] }, { "cell_type": "markdown", "id": "2f4d0f27", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "We now assume that the target is generated by this model $Y = \\boldsymbol{X}^\\top \\boldsymbol{\\beta} + \\epsilon$, where the observations of $\\epsilon$ are *uncorrelated* and with *mean zero* and *constant variance* $\\sigma^2$.\n", "\n", "> ***Question (optional)***\n", "> - Knowing that $\\boldsymbol{X} = \\boldsymbol{x}$, show that the observations of $y$ are uncorrelated, with mean $\\boldsymbol{x}^\\top \\boldsymbol{\\beta}$ and variance $\\sigma^2$.\n", "> - Show that $\\mathbb{E}(\\hat{\\boldsymbol{\\beta}} | \\mathbf{X}) = \\boldsymbol{\\beta}$ and $\\mathrm{Var}(\\hat{\\boldsymbol{\\beta}} | \\mathbf{X}) = \\sigma^2 (\\mathbf{X}^\\top \\mathbf{X})^{-1}$.\n", "> - Show that $\\hat{\\sigma}^2 = \\sum_{i = 1}^N (y_i - \\hat{y}_i)^2 / (N - p - 1)$ is an unbiased estimate of $\\sigma^2$, i.e $\\mathbb{E}(\\hat{\\sigma}^2) = \\sigma^2$." ] }, { "cell_type": "markdown", "id": "bc2ef5ab", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## Confidence Intervals\n", "\n", "We now assume that the error $\\epsilon$ is a Gaussian random variable, i.e $\\epsilon \\sim N(0, \\sigma^2)$ and would like to test the null hypothesis that $\\beta_j = 0$.\n", "\n", "> ***Question (optional)***\n", "> - Show that the $1 - 2 \\alpha$ confidence interval for $\\beta_j$ is\n", ">\n", "> $(\\hat{\\beta}_j - z^{(1 - \\alpha)}_{N - p - 1} \\hat{\\sigma} \\sqrt{v_j}, \\ \\ \\ \\ \\hat{\\beta}_j + z^{(1 - \\alpha)}_{N - p - 1} \\hat{\\sigma} \\sqrt{v_j})$,\n", ">\n", "> where $v_j = [(\\mathbf{X}^\\top \\mathbf{X})^{-1}]_{jj}$ and $z^{(1 - \\alpha)}_{N - p - 1}$ is the $(1 - \\alpha)$ percentile of $t_{N - p - 1}$ (see [Supplementary Material](appendix_supplementary_matrial.ipynb))." ] }, { "cell_type": "markdown", "id": "db5011ed", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## To go further\n", "\n", "- Basis expansion models : polynomials, splines, etc. (Chap. 5 in Hastie *et al.* 2009)" ] }, { "cell_type": "markdown", "id": "7005fab6", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## References\n", "\n", "- [James, G., Witten, D., Hastie, T., Tibshirani, R., n.d. *An Introduction to Statistical Learning*, 2st ed. Springer, New York, NY.](https://www.statlearning.com/)\n", "- Chap. 2, 3 and 7 in [Hastie, T., Tibshirani, R., Friedman, J., 2009. *The Elements of Statistical Learning*, 2nd ed. Springer, New York.](https://doi.org/10.1007/978-0-387-84858-7)\n", "- Chap. 5 and 7 in [Wilks, D.S., 2019. *Statistical Methods in the Atmospheric Sciences*, 4th ed. Elsevier, Amsterdam.](https://doi.org/10.1016/C2017-0-03921-6)" ] }, { "cell_type": "markdown", "id": "5e186998", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "***\n", "## Credit\n", "\n", "[//]: # \"This notebook is part of [E4C Interdisciplinary Center - Education](https://gitlab.in2p3.fr/energy4climate/public/education).\"\n", "Contributors include Bruno Deremble and Alexis Tantet.\n", "Several slides and images are taken from the very good [Scikit-learn course](https://inria.github.io/scikit-learn-mooc/).\n", "\n", "
\n", "\n", "
\n", " \n", "\"Logo\n", "\n", "\"Logo\n", "\n", "\"Logo\n", "\n", "\"Logo\n", "\n", "\"Logo\n", "\n", "\"Logo\n", "\n", "\"Logo\n", " \n", "
\n", "\n", "
\n", "\n", "
\n", " \"Creative\n", "
This work is licensed under a   Creative Commons Attribution-ShareAlike 4.0 International License.\n", "
" ] } ], "metadata": { "celltoolbar": "Slideshow", "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.9.12" }, "latex_envs": { "LaTeX_envs_menu_present": false, "autoclose": true, "autocomplete": false, "bibliofile": "biblio.bib", "cite_by": "apalike", "current_citInitial": 1, "eqLabelWithNumbers": true, "eqNumInitial": 1, "hotkeys": { "equation": "Ctrl-E", "itemize": "Ctrl-I" }, "labels_anchors": false, "latex_user_defs": false, "report_style_numbering": false, "user_envs_cfg": false }, "toc": { "base_numbering": 1, "nav_menu": {}, "number_sections": true, "sideBar": true, "skip_h1_title": false, "title_cell": "Table of Contents", "title_sidebar": "Contents", "toc_cell": false, "toc_position": {}, "toc_section_display": true, "toc_window_display": false }, "varInspector": { "cols": { "lenName": 16, "lenType": 16, "lenVar": 40 }, "kernels_config": { "python": { "delete_cmd_postfix": "", "delete_cmd_prefix": "del ", "library": "var_list.py", "varRefreshCmd": "print(var_dic_list())" }, "r": { "delete_cmd_postfix": ") ", "delete_cmd_prefix": "rm(", "library": "var_list.r", "varRefreshCmd": "cat(var_dic_list()) " } }, "types_to_exclude": [ "module", "function", "builtin_function_or_method", "instance", "_Feature" ], "window_display": false } }, "nbformat": 4, "nbformat_minor": 5 }