Appendix: Supplementary Material#
On Ordinary Least Squares (OLS)#
How to Minimize the Residual Sum of Squares (RSS)?#
The predictions with parameters \(\hat{\boldsymbol{\beta}}\) from the input data are given by
The residual vector is given by \(\hat{\mathbf{z}} = \mathbf{y} - \hat{\mathbf{y}}\).
Question (optional)
Show that \(\hat{\mathbf{y}}\) is the orthogonal projection of \(\mathbf{y}\) on the subspace of \(\mathbb{R}^N\) spanned by the columns of \(\mathbf{X}\) (i.e the column space of \(\mathbf{X}\)) and that \(\hat{\mathbf{z}}\) is orthogonal to this space.
Graphical Interpretation and Gram-Schmidt Algorithm#
By regressing \(\mathbf{b}\) on \(\mathbf{a}\) we mean regressing with input \(\mathbf{a}\) and target \(\mathbf{b}\).
Question
Regress \(\mathbf{x}\) on \(\mathbf{1}\) and compute the resulting residual \(\hat{\mathbf{z}}_1\).
Regress \(\mathbf{y}\) on \(\hat{\mathbf{z}}_1\). The result should be familiar.
Interpret the above procedure graphically.
Generalize this procedure to the case of \(p\) inputs and express the \(j\)th estimate in terms of some \(\hat{\mathbf{z}}_j\) as \(\hat{\beta}_j = \hat{\mathbf{z}_j}^\top \mathbf{y} / (\hat{\mathbf{z}_j}^\top \hat{\mathbf{z}_j})\) (optional).
Gauss-Markov Theorem#
We now assume that \(Y = \boldsymbol{X}^\top \boldsymbol{\beta} + \epsilon\), where the observations of \(\epsilon\) are uncorrelated and with mean zero and constant variance \(\sigma^2\).
Question (optional)
Express the variances of the parameter estimates in terms of the orthogonal basis of the column space of \(\mathbf{X}\) constructed above.
How does the precision of \(\hat{\beta}_j\) depend on the input data?
Least-squares estimates of the parameters have the smallest variance among all linear unbiased estimates. The OLS is BLUE (Best Linear Unbiased Estimator).
Let \(\tilde{\boldsymbol{\beta}}\) be any estimate of the parameters. We mean that for any linear combination defined by the vector \(\boldsymbol{a}\),
Question (optional)
Prove this theorem.
Confidence Intervals#
We now assume that the error \(\epsilon\) is a Gaussian random variable, i.e \(\epsilon \sim N(0, \sigma^2)\) and would like to test the null hypothesis that \(\beta_j = 0\).
Question (optional)
Show that \(\hat{\boldsymbol{\beta}} \sim N(\boldsymbol{\beta}, (\mathbf{X}^\top \mathbf{X}) \sigma^2)\).
Show that \((N - p - 1) \hat{\sigma}^2 \sim \sigma^2 \ \chi^2_{N - p - 1}\), a chi-squared distribution with \(N - p - 1\) degrees of freedom.
Show that \(\hat{\boldsymbol{\beta}}\) and \(\hat{\sigma}^2\) are statistically independent.
With \(v_j = [(\mathbf{X}^\top \mathbf{X})^{-1}]_{jj}\), we define the standardized coefficient or Z-score
Question (optional)
Show that \(z_j\) is distributed as \(t_{N - p - 1}\) (a Student’s-\(t\) distribution with \(N - p - 1\) degrees of freedom).
Show that the \(1 - 2 \alpha\) confidence interval for \(\beta_j\) is \((\hat{\beta}_j - z^{(1 - \alpha)}_{N - p - 1} \hat{\sigma} \sqrt{v_j}, \hat{\beta}_j + z^{(1 - \alpha)}_{N - p - 1} \hat{\sigma} \sqrt{v_j})\), where \(z^{(1 - \alpha)}_{N - p - 1}\) is the \((1 - \alpha)\) percentile of \(t_{N - p - 1}\).