Statistical Inverse Theory

From Theory of Measurements Wiki

Jump to: navigation, search

Contents

Basic Elements

The basic elements of statistical inverse theory are:

  • x\,: the unknown
  • m\,: the measurement
  • D_{\textrm{pr}}(x)\,: the a priori density of the unknown
  • D(m|x)\,: the conditional density of the measurement, supposed the unknown is known. This function completely describes the measurement process

Inverse solution: The inverse solution is the conditional density of the unknown, when the measurement is known:

inverse solution= D_p(x) = D(x|m)\,

This is simply given by

D(x|m) \approx D_{\textrm{pr}}(x)D(m|x) \approx  D(x,m), \,

where m\, is fixed. 329669885300142134227686

Non-linear example with Gaussian errors

Let us suppose m = f(x) + \varepsilon\,, where \varepsilon\, is a Gaussian error term independent of x\,:

D(m|x) = D_{\textrm{p}}(x) \exp \left(-\frac{1}{2}(m-f(x))^T\Sigma^{-1}(m-f(x))\right)\,

Suppose now that D_{\textrm{p}}(x) \approx\, constant and that the covariance of the measurement errors is diagonal:

\Sigma_{ij} = \sigma_i^2\delta_{ij}

The a posteriori density is then simply given by

D(x|m) = \exp\left(-\frac{1}{2}\sum_i\sigma_i^{-2}(f_i(x)-m_i)^2\right)

with peak at the ordinary weighted least-squares solution.

Important Notes

  • Because least-squares solution is the point of maximum posteriori density, it is often called the maximum likelihood estimate
  • Note that we get the weights automatically: they just appear when we calculate the formula for the posteriori distribution
  • It is not shown here, but it can be shown in many ways that the maximum likelihood estimator (with correct weights) is better than any other estimator.
  • What is said above is true regardless of what kind of use is made of the analysis results afterwards. In particular, the optimal weights do not depend on wheter one is interested in component 1 of the unknown or component 2 of the unknown.
  • If the measurement errors correlate (\Sigma\, is non-diagonal), the matrix formula in (1) cannot be written as a sum of a weighted least squares by any weights. The conclusion is: Optimal least squares in case of non-diagonal error covariance is obtained by minimizeing the matrix formula in (1) instead of any weighted sum of differences

39819381834

Linear example

Let us suppose that instead of m = f(x)+\varepsilon\,, we have m = Ax + \varepsilon\,, where A\, is a matrix. In addition, let us suppose that the a priori density of the unknown is given by

D_{\textrm{pr}}(x) = \exp \left(-\frac{1}{2}x^T\Sigma_0^{-1}x\right)\,

The a posteriori density is then given by

D(x|m) = \exp \left(-\frac{1}{2}x^T\Sigma_0^{-1}x+(m-Ax)^T\Sigma^{-1}(m-Ax)\right) =\exp\left(-\frac{1}{2}(x-hat x)^T\Sigma^{-1}_{\textrm{p}}(x-\hat x)\right)

where we have an analytical solution for the posteriori covariance \Sigma\, and centerpoint \hat x\,:

\Sigma_{\textrm{p}} = (\Sigma_0^{-1}+A^T\Sigma^{-1}A)^{-1}
\hat x = \Sigma_{\textrm{p}}A^T\Sigma^{-1}m

If we suppose the prior density is constant, this is simply

\Sigma_{\textrm{p}} = (A^T\Sigma^{-1}A)^{-1}\,
\hat x = \Sigma_{\textrm{p}}A^T\Sigma^{-1}m\,

and on the other hand, if we suppose we have many independent (vector-valued) measurements

m_i = A_ix + \varepsilon_i, \quad \langle \varepsilon_i\varepsilon_i\rangle = \Sigma_i, \quad i=1,2,\dots, N \,

we have the general solution

\Sigma_p = (\Sigma_0^{-1}+\sum_{i=1}^NA_i^T\Sigma_i^{-1}A_i)^{-1}
\hat x   = \Sigma_p\sum_{i=1}^NA_i^T\Sigma^{-1}_im_i

Note: The use of these formulas is not the most efficient way to solve linear inverse problems. Special-made direct numerical methods like flips are much more efficient.

Further study of finite-dimensional a posteriori distributions

It is easy to apply the matrix formulae to finite dimensional situations of a few parameters only. To do that, one has to calculate the matrix formulae by a computer and one then finds the widths of the a posteriori distributions in different situations. Let us suppose that we have a a posteriori covariance matrix \Sigma\, for the vector of the unknown quantities. Then we may get to know the first component of the unknowns with some certain accuracy, say \sigma\,. By the formulas for the finite-dimensional linear and Gaussian case, the new a posteriori covariance matrix is simply

\hat \Sigma = \left(\Sigma^{-1} +\begin{bmatrix} \sigma^{-2} & 0 & \dots \\ 0 & 0 & \dots \\ \vdots  & \vdots  & \ddots \end{bmatrix}\right)^{-1}

It is not necessary to invert the matrix numerically, because we can derive the expressions:

\hat\Sigma_{11} = \sigma^2/(1+\sigma^2/\Sigma_{11}) and
\hat\Sigma_{ii} = \Sigma_{ii}\frac{\varepsilon_i^2+\sigma^2/\Sigma_{11}}{1+\sigma^2/\Sigma_{11}}, i\neq 1

where \varepsilon^2 = 1 - \Sigma^2_{1i}/(\Sigma_{11}\Sigma_{ii}) depends on the correlation coefficients of the original a posteriori covariance matrix.

Here, the first formula is pretty obvious. The effects on the other parameters are less obvious: Firstly, if the additional a priori information is infinitelyaccurate \sigma \approx 0\,, the widths of the final (marginal) a posteriori distribution for the rest of the parameters are reduced from \Sigma_{ii}^{-1/2}\, to \Sigma_{ii}^{-1/2}\varepsilon_i\,, where \varepsilon^2 = 1-\Sigma_{1i}^2/(\Sigma_{11}\Sigma_{ii}) depends on the correlation coefficient of the parameter with additional a priori information and the parameter under study. If the additional a priori information is broad, it does not change the widht of the a posteriori distributions for any parameters.

The most interesting results come for the intermediate cases: If \varepsilon \ll \sigma^2/\Sigma_{11}\ll 1, we get \hat \Sigma_{ii} \approx (\sigma^2/\Sigma_{11})\Sigma_{ii} that is, every parameter improves by the ratio of the width of the additional a priori information of the first parameter and the width of its original (marginal) a posteriori distribution with the exceptions that the a posteriori distribution:

  1. never becomes narrower than that resulting from infinitely good additional a priori and
  2. never becomes broader than it originally was.

Other studies like this appear in (Vallinkoski and Lehtinen, 1990 a and b).


Other Lessons and Main Page


Personal tools