Probability Theory

From Theory of Measurements Wiki

Jump to: navigation, search

In statistical inverse theory, the probability theory plays a crucial role. The knowledge of stochastic processes and probability laws is therefore needed, in order to get some understanding. Traditionally statistical inverse theory belongs to the so-called Bayesian school of probability theory.

In this chapter, we shall run through basic definitions of probability theory.

Contents

Basic Definitions

First of all, we shall need couple of definitions.

Let \Omega\, be a set and \Sigma \subset P(\Omega)\, be a class of subsets of \Omega\,. Class \Sigma\, is called a \sigma\,-algebra, if it fulfills the following three conditions.

  1. Empty set \emptyset \in \Omega\,
  2. If set A\in \Sigma\,, then also A^c = \Omega \setminus A \in \Sigma \,
  3. Let A_1, A_2,\dots \in \Sigma\,, then also \cup_{k=1}^\infty A_k\in \Sigma \,.

Measurable Space

If \Omega\, is a set and a class \Sigma\, of subsets of \Omega\, is \sigma\,-algebra, the pair (\Omega,\Sigma)\, is called a measurable space.

Measures

If (\Omega,\Sigma)\, is a measurable space, positive mappings (set functions) \mu: \Sigma \rightarrow [0,\infty]\, are called measures, if the measure of empty set is zero \mu(\emptyset)=0\, and they are countably additive

\mu\left(\cup_{i=0}^\infty A_i\right) = \sum_{i=0}^\infty\mu(A_i)\,

for any pairwise disjoint collections of sets A_i\in \Sigma\,, A_i\cap A_j=\emptyset\,, if i\neq j\,.

Probability Measure, Probability Space

If \mu(\Omega)=1\,, \mu\, is called a probability measure. Probability measures are denoted by \mathbf{P}\,. The triplet (\Omega,\Sigma,\mathbf{P})\, is then called a probability space.


Events and Simple Examples

If (\Omega,\Sigma,\mathbf{P})\, is a propability space, the measurable sets in \Sigma\, are often called events. A very simple example of a propability space is for example a finite set \Omega\,, with \Sigma\, the class of all subsets of \Sigma\,. A simple propability measure is then defined by P(A) = N(A)/N(\Omega)\, for any A\subset \Omega\, and N(A)\, denoting the number of points in the set A\,. Another simple example is found by defining \Omega = [0, 1]\, (the unit closed interval in the real axis.) In this case, \Sigma\, is the class of all Lebesque-measurable subsets of [0, 1]\, and the propability measure is simply the Lebesque measure. A little more complicated example is the unit two-dimensional square [0, 1]\times [0, 1]\, equipped with the \sigma\,-algebra of Lebesque-measurable subsets and Lebesgue measure as the probability measure.

Disjoint and Independent Events

Two events A_1 \in \Sigma\, and A_2\in \Sigma\, are called disjoint, if A_1 \cap A_2 = \emptyset\,. Then

P(A_1 \cup A_2) = P(A_1) +P(A_2).

Two events A_1\in \Sigma and A_2 \in \Sigma are called independent, if

P(A_1 \cap A_2) = P(A_1)P(A_2)\,.

A basic example of independent events is the following: Let \Omega = [0, 1] \times [0, 1] equipped with the Lebesgue measure. If A \subset [0, 1] and B\subset [0, 1] are any measurable sets, the events

A_1 = A \times [0, 1]\,
A_2 = [0, 1] \times B\,

are independent.

Random variables

Random variables with values in M\, Let (\Omega,\Sigma,\mathbf{P})\, be a propability space and (M, B)\, a measurable space. Measurable mappings \Omega \rightarrow M\, are called random variables with values in M\,.

We can also denote the measurable space (M,B)\, with other symbols than M\,. An often used symbol is X\,. Random variables with values in X\, are often used to denote the unknowns in an inverse problem, while random variables with values in M\, are often used to denote the measurements in an inverse problem.

Conditional probabilities

In the case of simple events, conditional probabilities are simply defined through

\mathbf{P}(A|B) = \frac{\mathbf{P}(A \cap B)}{\mathbf{P}(B)}

This works, if the probability of the condition is greater than zero \mathbf{P}(B)>0\,

Conditional probabilities can also be defined with conditions whose probabilities are zero -- like the having as condition the requirement that some random variable gets a specific value.

In this case, the definition is mathematically complex and we do not show details here. In general, the conditional probabilities are defined through specific \sigma\,-algebras and they are only almost certainly defined.

We will only need these kinds of conditional probabilities for random variables in Euclidean spaces with distributions defined by densities. The future derivations can be understood, if one understands the conditional probabilities as defined above.

Example of conditional probabilities

We suppose that \Omega\, is a finite set which consists of the following elements:

\omega_1\, - A space scientist finds a Gyromitra Esculenta, cooks it and eats it and does not die.
\omega_2\, - A space scientist finds a Gyromitra Esculenta, does not cook it, eats it and dies.
\omega_3\, - A space scientist does not find a Gyromitra Esculenta and survives.
G. Esculenta
Enlarge
G. Esculenta

Let us suppose that all the probabilities of the points \omega_i\, are equal: \mathbf{P}(\omega_i) = 1/3\,.

Let us define the following events:

Event A: The space scientist survives.
Event B: The space scientist finds a Gyromitra Esculenta.

Then A = (\omega_1,\omega_3)\, and B = (\omega_1, \omega_2)\, with \mathbf{P}(A)=\mathbf{P}(B)=2/3\,. The conditional probability of the space scientist surviving after finding a gyromitra esculenta is then \mathbf{P}(A\cap B)/\mathbf{P}(B) = 1/2\,.

Intuitive Approach

2-Dimensional Gaussian, negative correlation.
Enlarge
2-Dimensional Gaussian, negative correlation.
2-Dimensional Gaussian, positive correlation: This is an example of a finite variable with approximate distribution of a Gaussian with positive correlation.
Enlarge
2-Dimensional Gaussian, positive correlation: This is an example of a finite variable with approximate distribution of a Gaussian with positive correlation.
2-Dimensional Gaussian, more positive correlation: When the correlation gets closer to 1, the distribution gets narrow.
Enlarge
2-Dimensional Gaussian, more positive correlation: When the correlation gets closer to 1, the distribution gets narrow.
2-Dimensional Gaussian, no correlation: When the correlation becomes zero, the distribution becomes symmetric.
Enlarge
2-Dimensional Gaussian, no correlation: When the correlation becomes zero, the distribution becomes symmetric.

Here is an example of a random variable with a finite \Omega\, set. There are 10000 points in \Omega\, and the random variable x:\Omega\mapsto \mathbf{R}^2\, approximates a Gaussian distribution with mean in (4,3).



Densities of Random Variables

A very important case of random variables are variables getting their values in Euclidean spaces. These variables are easiest described by their densities. Let us suppose x: \Omega \rightarrow  \mathbf{R}^n\, is a random variable. Moreover, suppose that \mathbf{P}(x^{-1}(A)) = \int_A D(x) d^nx\, We then call D(x)\, the density of the random variable x\,.

EXPLANATION: The role of the probability space \Omega\, is rather obscure when all derivations are based on densities of random variables.

It can be understood as some sort of a lottery machine: each realization of the random variable x = x(\Omega)\, corresponds to a certain \omega\in\Omega\, and these \omega\,s are drawn from the set \Omega\, through the propability measure \mathbf{P}\,.

Marginal densities

Let us study a joint density of two random variables D(x, y)\,. This is just a density in the space X \times Y\, . The marginal density of x\, is defined by

D(x) = \int_Y D(x, y)d^ny.\,

The marginal density is just the density of x\,, with y\, completely ignored.

Conditional densities

Let us again study two random variables with joint density D(x, y)\,. The conditional density is defined by

D(x|y) = \frac{D(x,y)}{D(y)} = \frac{D(x, y)}{\int_X D(x,y)d^nx}\,

This formula follows from the general mathematical definition of conditional distributions for random variables. It is also easy to intuitively understand if one thinks x\, and y\, are real-valued variables and divides the x\, and y\, -axis's into small intervals. The discrete conditional probabilities then approximately equal to the conditional density (provided it is continuous).

Bayes theorem

If we write the previous definition of conditional densities in a two different ways and play around with them a bit, we get the Bayes theorem:

D(y|x) = \frac{D(x,y)}{D(x)}\,
D(x|y) = \frac{D(x,y)}{D(y)}\,
D(x|y)D(y) = D(x,y) = D(y|x)D(x)\,

And the Bayes' theorem is:

D(x|y) = \frac{D(y|x)D(x)}{D(y)}\,,

where eg., y\, can denote a measurement, and x\, a model, or model parameter.

If we ignore the normalization constant D(y) we can use a more simple form:

D(x|y) \propto D(x,y) = D(y|x)D(x),


In the picture below we have illustrated a 2-dimensional distribution together with its marginal density for y\, and an example of a conditional density for x\,. It can serve as a illustration of the Bayes theorem also: How to find a suitable \Omega\, if only a density is given.

Suppose we have a random variable x:\Omega \rightarrow \mathbf{R}^n\,, but we only know it's density D(x)\,. This very often happens in applications. No probability space is explicitly specified, but in order to be able to use the mathematical formalism, we need the mapping: x : \Omega \rightarrow \mathbf{R}^n\,. Solution: Let us simply define \Omega = \mathbf{R}^n, x:\Omega \rightarrow \mathbf{R}^n\, to be the identity mapping and the propability measure in \Omega\, be defined by

\mathbf{P}(A) = \int_A D(x) d^nx\,

for every A\, in the \sigma\,-algebra \Sigma\, of all Lebesque-measurable sets in \Omega = \mathbf{R}^n\,.

The expectation value of a random variable

The expectation value of a random variable in a linear space is defined to be Ito's average value when all possible realizations are averaged using the propability measure.'

\langle x\rangle = \int_\Omega x(\omega)P(d\omega)

In terms of a density D(x)\,, this is given by

\langle x\rangle = \int_{\mathbf{R}^n}xD(x)d^nx

Note that

\int_{\mathbf{R^n}}D(x)d^nx=1

because it is completely certain that x gets some value in \mathbf{R}^n.

The moments of a random variable

The moments of a random variable in \mathbf{R}^n\, are defined as the expectation values of the new random variable defined as products of its components. In particular, the second moments are defined by

\langle x_ix_j\rangle = \int_\Omega x_i(\omega)x_j(\omega)\mathbf{P}(d\omega)

and in terms of a density D(x), this is given by

\langle x_ix_j\rangle = \int_{\mathbf{R}^n}x_ix_jD(x)d^nx

Gaussian random variables

Gaussian random variables are variables with Gaussian densities:

D(x) = \frac{1}{\sqrt{(2\pi)^{n/2} \det \Sigma}} \exp\left(-\frac{1}{2}(x-x_0)^T\Sigma^{-1}(x-x_0)\right)

The expectation value of the random variable is given by the center point:

\langle x\rangle = \int_{\mathbf{R}^n}xD(x)d^nx=x_0

and the central second moments are given by

\langle (x_i-x_0)(x_j-x_0)\rangle = \int_{\mathbf{R}^n}(x_i-x_0)(x_j-x_0)D(x)d^nx= \Sigma_{ij}

Other Lessons and Main Page


Personal tools