Appendix: Elements of Probability Theory#
Probability Measures#
Sample Space
The set of all possible outcomes of an experiment is called the sample space and is denoted by \(\Omega\).
\(Events\) are defined as subsets of the sample space.
\(\sigma\)-algebra
A collection \(\mathcal{F}\) of sets in \(\Omega\) is called a \(\sigma\)-algebra on \(\Omega\) if
\(\emptyset \in \mathcal{F}\);
if \(A \in \mathcal{F}\), then \(A^c \in \mathcal{F}\);
if \(A_1, A_2, \ldots \in \mathcal{F}\), then \(\cup_{i = 1}^\infty A_i \in \mathcal{F}\).
Generated \(\sigma\)-algebra
The intersection of all the \(\sigma\)-algebras containing \(\mathcal{F}\), denoted \(\sigma(\mathcal{F})\), is a \(\sigma\)-algebra that we call the \(\sigma\)-algebra generated by \(\mathcal{F}\).
Borel \(\sigma\)-algebra
Let \(\Omega = \mathbb{R}^p\). The \(\sigma\)-algebra generated by the open subsets of \(\mathbb{R}^p\) is called the Borel \(\sigma\)-algebra of \(\mathbb{R}^p\) and is denoted by \(\mathcal{B}(\mathbb{R}^p)\).
The \(\sigma\)-algebra of a sample space contains all possible outcomes of the experiment that we want to study.
Intuitively, the \(\sigma\)-algebra contains all the useful information that is available about the random experiment that we are performing.
Probability Measure
A probability measure \(\mathbb{P}\) on the measurable space \((\Omega, \mathcal{F})\) is a function \(\mathbb{P}: \mathcal{F} \to [0, 1]\) satisfying
\(\mathbb{P}(\emptyset) = 0\), \(\mathbb{P}(\Omega) = 1\);
For \(A_1, A_2, \ldots\) with \(A_i \cap A_j = \emptyset\), \(i \ne j\), then
Probability Space
The triple \((\Omega, \mathcal{F}, \mathbb{P})\) comprising a set \(\Omega\), a \(\sigma\)-algebra \(\mathcal{F}\) of subsets of \(\Omega\) and a probability measure \(\mathbb{P}\) on \((\Omega, \mathcal{F})\) is called a probability space.
Independent Sets
The sets \(A\) and \(B\) are \(independent\) if
Random Variables#
Measurable Space
A sample space \(\Omega\) equipped with a \(\sigma\)-algebra of subsets \(\mathcal{F}\) is called a measurable space.
It also shows how random variable is used for defining probability mass functions.
By Niyumard - Own work, CC BY-SA 4.0
Random Variable and Random Vector
Let \((\Omega, \mathcal{F})\) and \((\mathbb{R}^p, \mathcal{B}(\mathbb{R}^p))\) be two measurable spaces.
Is called a measurable function or random variable (\(p = 1\)) or random vector (\(p > 1\)) a function \(\boldsymbol{X}: \Omega \to \mathbb{R}^p\) such that the event
\(\{\omega \in \Omega: X_1(\omega) \le x_1, \ldots, X_p(\omega) \le x_p\}\) \(=: \{\omega \in \Omega: \boldsymbol{X}(\omega) \le \boldsymbol{x}\}\) \(=: \{\boldsymbol{X} \le \boldsymbol{x}\}\)
belongs to \(\mathcal{F}\) for any \(\boldsymbol{x} \in \mathbb{R}^p\).
In other words, the preimage of any Borel set under \(\boldsymbol{X}\) is an event.
By Inductiveload - self-made, Mathematica, Inkscape, Public Domain
Distribution Function of a Random Variable
Every random variable from a probability space \((\Omega, \mathcal{F}, \mu)\) to \((\mathbb{R}, \mathcal{B}(\mathbb{R}))\) induces a probability measure \(\mathbb{P}\) on \(\mathbb{R}\) that we identify with the probability distribution function \(F_X: \mathbb{R} \to [0, 1]\) defined as
\(F_X(x) = \mu(\omega \in \Omega: X(\omega) \le x)\) \(=: \mathbb{P}(X \le x), x \in \mathcal{B}(\mathbb{R})\).
In this case, \((\mathbb{R}, \mathcal{B}(\mathbb{R}), F_X)\) becomes a probability space.
If \(X\) is not a measurable function, there exists an \(x\) in \(\mathbb{R}\) such that \(\{\omega \in \Omega: X \le x\}\) is not an event. Then, \(\mathbb{P}(X \le x) = \mu(\omega \in \Omega: X \le x)\) is not defined and we cannot define the distribution of \(X\).
The marginal densities are shown as well.
By IkamusumeFan - Own work, CC BY-SA 3.0
Joint Distribution Function
Let \(X\) and \(Y\) be two random variables.
We can then define their joint distribution function as
We can view them as a random vector, i.e. a random variable from \(\Omega\) to \(\mathbb{R}^2\).
Independent Random Variables
Two random variables \(X\) and \(Y\) on \(\mathbb{R}\) are independent if the events \(\{\omega \in \Omega: X(\omega) \le x\}\) and \(\{\omega \in \Omega: Y(\omega) \le y\}\) are independent for all \(x, y \in \mathbb{R}\).
If \(X\) and \(Y\) are independent then \(F_{X, Y}(x, y) = F_X(x)F_Y(y)\).
Distribution Function of a Random Vector
Every random variable from a probability space \((\Omega, \mathcal{F}, \mu)\) to \((\mathbb{R}^p, \mathcal{B}(\mathbb{R}^p))\) induces a probability measure \(\mathbb{P}\) on \(\mathbb{R}^p\) that we identify with the distribution function \(F_\boldsymbol{X}: \mathbb{R}^p \to [0, 1]\) defined as
Expectation of Random Variables
Let \(X\) be a random variable from \((\Omega, \mathcal{F}, \mu)\) to \((\mathbb{R}^p, \mathcal{B}(\mathbb{R}^p))\).
We define the expectation of \(\boldsymbol{X}\) by
More generally, let \(f: \mathbb{R}^p \to \mathbb{R}\) be measurable. Then
\(dF_\boldsymbol{X}(\boldsymbol{x}) = \mathbb{P}(d\boldsymbol{x}) = \mathbb{P}(dx_1, \ldots, dx_p)\) and \(\int\) denotes the Lebesgue integral.
\(L^p\) spaces
By \(L^p(\Omega, \mathcal{F}, \mu)\) we mean the Banach space of measurable functions on \(\Omega\) with norm
In particular, we say that \(X\) is integrable if \(\|X\|_{L^1} < \infty\) and that \(X\) has finite variance if \(\|X\|_{L^2} < \infty\).
Variance, Covariance and Correlation of Two Random Variables#
Variance of a Random Variable
Provided that it exists, we define the variance of a random variable \(X\) as
correlation coefficient. The correlation reflects the noisiness and
direction of a linear relationship (top row), but not the slope of that
relationship (middle), nor many aspects of nonlinear relationships (bottom).
N.B.: the figure in the center as a slope of 0 but in that case the
correlation coefficient is undefined because the variance of Y is zero.
DenisBoigelot, original uploader was Imagecreator
Covariance, Variance and Correlation of Two Random Variables
Provided that it exists, we define the covariance of two random variables \(X\) and \(Y\) as
\(\mathrm{Cov}(X, Y) := \mathbb{E}\left[(X - \mathbb{E}(X)) (Y - \mathbb{E}(Y))\right]\) \(= \mathbb{E}(X Y) - \mathbb{E}(X) \mathbb{E}(Y).\)
The correlation of \(X\) and \(Y\) is
\(\mathrm{Corr}(X, Y) := \frac{\mathrm{Cov}(X, Y)}{\sqrt{\mathrm{Var}(X)} \sqrt{\mathrm{Var}(Y)}}.\)
Conditional Expectation#
Conditional Probability
Let \(A\) and \(B\) be two events and suppose that \(\mathbb{P}(A) > 0\).
The conditional probability of \(B\) given \(A\) is
For random variables, the conditional probability of \(Y\) knowing \(X\) is
Conditional Expectation
Assume that \(X \in L^1(\Omega, \mathcal{F}, \mu)\) and let \(\mathcal{G}\) be a sub-\(\sigma\)-algebra of \(\mathcal{F}\).
The conditional expectation of \(\boldsymbol{X}\) with respect to \(\mathcal{G}\) is the function \(\mathbb{E}(\boldsymbol{X} | \mathcal{G}): (\Omega, \mathcal{G}) \to \mathbb{R}^p\), which is a random variable satisfying
It follows that (law of total expectation)
Conditional Distribution Function
Given \(\mathcal{G}\) a sub-\(\sigma\)-algebra of \(\mathcal{F}\), we define the conditional distribution function
Assume that \(f: \mathbb{R}^p \to \mathbb{R}\) is such that \(\mathbb{E}(f(X)) < \infty\). Then
Conditional Expectation with respect to a Random Vector
The conditional expectation of \(\boldsymbol{X}\) given \(\boldsymbol{Y}\) is defined by
where \(\sigma(\boldsymbol{Y}) := \{\boldsymbol{Y}^{-1}(B): B \in \mathcal{B}(\mathbb{R}^p)\}\) is the \(\sigma\)-algebra generated by \(\boldsymbol{Y}\).
Conditional Variance and Least Squares#
Conditional Variance
Suppose that \(Y\) is a random variable and that \(\mathcal{G}\) is a sub-\(\sigma\)-algebra of \(\mathcal{F}\).
Then, the random variable
is called the conditional variance of \(Y\) knowing \(\mathcal{G}\).
It tells us how much variance is left if we use \(\mathbb{E}(Y | \mathcal{G})\) to predict \(Y\).
The Conditional Expectation Minimizes the Squared Deviations
Let \(X\) and \(Y\) be random variables with finite variance, let \(g\) be a real-valued function such that \(\mathbb{E}[g(X)^2] < \infty\).
Then (Theorem 10.1.4 in Gut 2005)
where equality is obtained for \(g(X) = \mathbb{E}(Y | X)\).
Thus, the expected conditional variance of \(Y\) given \(X\) shows up as the irreducible error of predicting \(Y\) given only the knowledge of \(X\).
Absolutely Continuous Distributions and Densities#
Absolutely Continuous Distribution
A distribution function \(F\) is absolutely continuous with respect to the Lebesgue measure (denoted \(dx\)) if and only if there exists a non-negative, Lebesgue integrable function \(f\), such that
The function \(f\) is called the density of \(F\) and is denoted by \(\frac{dF}{dx}\).
Equivalently, \(F\) is absolutely continuous if and only if, for every measurable set \(A\), \(dx(A) = 0\) implies \(\mathbb{P}(X \in A) = 0\).
Marginal Density
From an absolutely continuous random vector \((X, Y)\) with density \(f_{X, Y}\), we can derive the density of \(X\), or marginal density by integrating over \(Y\):
If \(X\) is an absolutely continuous random variable, with density \(f_X\), \(g\) is a measurable function, and \(\mathbb{E}(|g(X)|) < \infty\), then
If \(X\) and \(Y\) are absolutely continuous, then \(X\) and \(Y\) are independent if and only if the joint density is equal to the product of the marginal ones, that is
Conditional Density
Let \(X\) and \(Y\) have a joint absolutely continuous distribution.
For \(f_X(x) > 0\), the conditional density of \(Y\) given that \(X = x\) equals
Then the conditional distribution of \(Y\) given that \(X = x\) is derived by
If \(X\) and \(Y\) are independent then the conditional and the unconditional distributions are the same.
Sample Estimates#
Let \((x_i, y_i), i = 1, \ldots, N\) be a sample drawn from the joint distribution of the random variables \(X\) and \(Y\). Then we have the following unbiased estimates:
Sample mean: \(\bar{x} = \sum_{i = 1}^N x_i\)
Sample variance: \(s_X^2 = \frac{1}{N - 1} \sum_{i = 1}^N \left(x_i - \bar{x}\right)^2\)
Sample covariance: \(q_{X,Y} = \frac{1}{N - 1} \sum_{i = 1}^N \left(x_i - \bar{x}\right) \left(y_i - \bar{y}\right)\)
Confidence Intervals#
Example: Regional surface temperature in France#
# Import modules
from pathlib import Path
import numpy as np
import pandas as pd
import holoviews as hv
import hvplot.pandas
import panel as pn
pn.extension()
# Set data directory
data_dir = Path('data')
# Set keyword arguments for pd.read_csv
kwargs_read_csv = dict(header=0, index_col=0, parse_dates=True)
# Set first and last years
FIRST_YEAR = 2014
LAST_YEAR = 2021
# Define file path
filename = 'surface_temperature_merra2_{}-{}.csv'.format(
FIRST_YEAR, LAST_YEAR)
filepath = Path(data_dir, filename)
# Read hourly temperature data averaged over each region
df_temp = pd.read_csv(filepath, **kwargs_read_csv).resample('D').mean()
temp_lim = [-5, 30]
label_temp = 'Temperature (°C)'
WIDTH = 260
# Scatter plot of demand versus temperature
def plot_temp(region_name, year):
df = df_temp[[region_name]].loc[str(year)]
df.columns = [label_temp]
nt = df.shape[0]
std = float(df.std(0))
mean = pd.Series(df[label_temp].mean(), index=df.index)
df_std = pd.DataFrame(
{'low': mean - std, 'high': mean + std}, index=df.index)
cdf = pd.DataFrame(index=df.sort_values(by=label_temp).values[:, 0],
data=(np.arange(nt)[:, None] + 1) / nt)
cdf.index.name = label_temp
cdf.columns = ['Probability']
pts = df.hvplot(ylim=temp_lim, title='', width=WIDTH).opts(
title='Time series, Mean, ± 1 STD') * hv.HLine(
df[label_temp].mean()) * df_std.hvplot.area(
y='low', y2='high', alpha=0.2)
pcdf = cdf.hvplot(xlim=temp_lim, ylim=[0, 1], title='', width=WIDTH).opts(
title='Cumulative Distrib. Func.') * hv.VLine(
df[label_temp].mean())
pkde = df.hvplot.kde(xlim=temp_lim,
width=WIDTH) * hv.VLine(
df[label_temp].mean()).opts(title='Probability Density Func.')
return pn.Row(pts, pcdf, pkde)
# Show
pn.interact(plot_temp, region_name=df_temp.columns,
year=range(FIRST_YEAR, LAST_YEAR))
/tmp/ipykernel_709/3467941216.py:7: FutureWarning: Calling float on a single element Series is deprecated and will raise a TypeError in the future. Use float(ser.iloc[0]) instead
std = float(df.std(0))
References#
Credit#
Contributors include Bruno Deremble and Alexis Tantet.