Tomography Problems

From Theory of Measurements Wiki

Jump to: navigation, search

Contents

Definition of the Tomographic Inversion Problem

We consider a two-dimensional target described by a function f(\xi,\psi)\, and a grid as shown in Fig. 1. The grid can be rectangular as in Fig. 1, but different shapes of grid elements, e.g. triangular, can also be chosen. The measurements are integrals of the target function along a set of lines s_k\,, k = 1, 2, 3, \dots\,, i.e.

f(\xi,\psi) = \int_{s_k}f(\xi,\psi)ds

and the unknowns are the values of the target function at the grid points (\xi_i,\psi_j)\,. The task is to determine the best values of the unknowns x_{ij} = f(\xi_i,\psi_j)\,, when the measurements and their error estimates are known.

The direct theory of the inversion problem is formulated by assuming bilinear interpolation of the target function inside each grid element. In the case of a rectangular grid, for instance,

f(\xi,\psi) \approx \left(x_{ij}\frac{\xi-\xi_{i+1}}{\xi_i-\xi_{i+1}}-x_{i+1,j}\frac{\xi-\xi_i}{\xi_i-\xi_{i+1}}\right)\frac{\psi-\psi_{j+1}}{\psi_j-\psi_{j+1}} +\left(x_{i+1,j+1}\right)

When the target function is approximated using this interpolation, each measurement is a linear combination of the unknowns. By collecting all measurements into a single column vector \mathbf{m} and all unknowns into a column vector \mathbf{x}\,, we can write

\mathbf{m} = \mathbf{A_m  x} + \varepsilon_{\mathbf{m}},

where \mathbf{A_m} is a matrix of the linear coefficients obtained from the integrals and \varepsilon_{\mathbf{m}} is a column vector containing the measurement errors. Each line of integration crosses only a part of the grid elements and only the unknowns at the corners of those elements contibute the corresponding measurement. Therefore \mathbf{A} is necessarily a sparse matrix. The task of the tomographic inversion is to determine the unknown vector \mathbf{x} from this equation.


Stochastic inversion

Stochastic inversion is a method which treats the measurements, the unknowns and the measurement errors as random variables. The joint probablility distribution function of \mathbf{x} and \mathbf{m} is

D_{\mathbf{xm}}(\mathbf{x},\mathbf{m}) = \frac{1}{D_{\mathbf{m}}(\mathbf{m})}D_{\mathbf{x}|\mathbf{m}}(\mathbf{x}|\mathbf{m}) = D_{\mathbf{x}}(\mathbf{x})D_{\mathbf{m}|\mathbf{x}}(\mathbf{m}|\mathbf{x}),

where D_{\mathbf{x}|\mathbf{m}}(\mathbf{x}|\mathbf{m}) and D_{\mathbf{m}|\mathbf{x}}(\mathbf{m}|\mathbf{x}) are the conditional densities of \mathbf{x} and \mathbf{m}, respectively, and D_{\mathbf{x}}(\mathbf{x}) and D_{\mathbf{m}}(\mathbf{m}) are their a priori densities. We are interested in the distribution of the unknowns when the measurements are fixed, i.e. on the conditional density

D_{\mathbf{x}|\mathbf{m}}(\mathbf{x}|\mathbf{m}) = \frac{1}{D_{\mathbf{m}}(\mathbf{x})}D_{\mathbf{x}}(\mathbf{x})D_{\mathbf{m}|\mathbf{x}}(\mathbf{m}|\mathbf{x}),

which is also called the a posteriori density. After the results of the measurement are known, \mathbf{m} is fixed in this equation and therefore D_{\mathbf{m}}(\mathbf{m}) is only a normalisation constant. Since \mathbf{x} is fixed (although unknown) in each target to be measured, the conditional density of the error vector is

D_{\varepsilon |\mathbf{x}}(\mathbf{m}|\mathbf{x}) = D_{\varepsilon |\mathbf{x}}(\mathbf{m}- \mathbf{A_m  x}|\mathbf{x}) = D_{\mathbf{m}|\mathbf{x}}(\mathbf{m}|\mathbf{x}).

Hence the a posteriori density of \mathbf{x} is

D_{\mathbf{x}|\mathbf{m}}(\mathbf{x}|\mathbf{m}) = \frac{1}{D_{\mathbf{m}}(\mathbf{m})} D_{\mathbf{x}}(\mathbf{x})  D_{\varepsilon | \mathbf{x}} (\mathbf{m} - \mathbf{A_m} \mathbf{x} |\mathbf{x}).

Next we assume that the measurement errors are Gaussian with zero mean, which gives

D_{\mathbf{x}|\mathbf{m}}(\mathbf{x}|\mathbf{m}) \propto D_{\mathbf{x}}(\mathbf{x}) \exp  \left( -\frac{1}{2}(\mathbf{A_mx}-\mathbf{m})^T \Sigma^{-1}(\mathbf{A_mx}-m)\right).

Here \mathbf{m} = \langle \varepsilon_{\mathbf{m}}^T\varepsilon_{\mathbf{m}}\rangle is the error covariance matrix and T indicates the transpose.

The a posteriori density contains all available information on the unknowns after the measurements have been carried out. The maximum point \hat\mathbf{x}\, of the a posteriori distribution is the most likely solution of the inversion problem. If no a priori information is available, D_{\mathbf{x}}(\mathbf{x})\, is constant in eq. (8), and a formal solution can be written for \mathbf{x}\,. This solution, however, is unstable because it contains inversions of large matrices. Therefore some sort of regularisation is needed in order to obtain a stable solution.

The instability in the solution appears as too strong point-to-point variations in the inversion results. In order to obtain smoother stable solutions, one should find a way of limiting these variations.

Let us consider two neighboring grid points g_i\, and g_j\, either in \xi\, or \psi\, direction. We consider the difference of the unknowns at these points, i.e.

\varepsilon^{(ij)}_r = x_i - x_j .

By collecting all such differences at neighbouring grid points into a column vector, we obtain an equation

\varepsilon_{\mathbf{r}} = \mathbf{A_r  x}.

Here \mathbf{A_r}\, is a matrix with elements +1, -1 and 0 in appropriate places so that each component of eq. (10) is of the form (9). Obviously \varepsilon_{\mathbf{r}}\, is a random vector with some distribution function

D_{\varepsilon_\mathbf{r}}(\varepsilon_{\mathbf{r}}) = D_{\varepsilon_\mathbf{r}}(\mathbf{A_rx})

If this contains a priori information on the probabilities of the changes of the unknowns from one grid point to another,

D_{\mathbf{x}}(\mathbf{x}) = D_{\varepsilon_{\mathbf{r}}} (\mathbf{A_r  x}),

and

D_{\mathbf{x}|\mathbf{m}}(\mathbf{x}|\mathbf{m}) \propto D_{\varepsilon_\mathbf{r}} (\mathbf{A_r  x}) \exp \left( - \frac{1}{2} (\mathbf{A_mx}-\mathbf{m})^T\Sigma_{\mathbf{m}}^{-1}(\mathbf{A_mx}-\mathbf{m})\right)

In this formalism, stable solutions can be obtained by means of an a priori density, which controls statistically the variation of the unknowns from point to point.

If we assume that \varepsilon_{\mathbf{r}}\, is a multivariate Gaussian random variable with zero mean and a covariance matrix \Sigma_{\mathbf{r}} = \langle \varepsilon_{\mathbf{r}}\varepsilon_{\mathbf{r}}\rangle, the a posteriori density is

D_{\mathbf{x}|\mathbf{m}}(\mathbf{x}|\mathbf{m})\propto \exp\left(-\frac{1}{2}(\mathbf{A_rx})^T\Sigma^{-1}(\mathbf{A_rx})-\frac{1}{2}(\mathbf{A_mx}-\mathbf{m})^T\Sigma^{-1}(\mathbf{A_mx}-\mathbf{m})\right)

Since (\mathbf{A_r  x})^T \Sigma^{-1}_{\mathbf{r}} (\mathbf{A_r  x}) is a quadratic form of the components of \mathbf{x}, the chosen D_{\mathbf{x}}(\mathbf{x}) is called L^2 priori. The maximum point of the a posteriori density function is given by

\hat{\mathbf{x}} = (\mathbf{A_m}^T \Sigma_{\mathbf{m}}^{-1}\mathbf{A_m}+\mathbf{A_r}^T\Sigma^{-1}_{\mathbf{r}}\mathbf{A_r})^{-1}\mathbf{A_m}^T\Sigma^{-1}\mathbf{m}.

This is the solution of the inversion problem.

Eq. (15) gives stable solutions and the smoothness of the result can be varied by changing the covariances in \Sigma_{\mathbf{r}}. However, eq. (15) is not well suited for investigating target functions with sharp gradients, since it favours smoothly behaving solutions. Another drawback is that it does not contain any non-negativity constraint, and therefore the result may be negative within some reegions, although the target function is normally positive.

Steeper gradients of the target function can be reconstructed if, instead of an L^2 priori, an L^1 priori

D_\mathbf{x}(\mathbf{x}) \propto \exp \left( - C \sum_{i,j} |x_i-x_j|\right)

is chosen. Here C is a constant and i and j get only such values that xi and xj are unknowns in neighbouring grid points either in the ξ or the ψ direction. With this a priori density, the a posteriori density is

D_{\mathbf{x}|\mathbf{m}}(\mathbf{x}|\mathbf{m})\propto \exp\left( - C \sum_{i,j}|x_i-x_j| - \frac{1}{2}(\mathbf{A_mx}-\mathbf{m})^T\Sigma^{-1}_{\mathbf{m}}(\mathbf{A_mx}-\mathbf{m})\right)

A drawback in this density is that its maximum point cannot be obtained in a straghtforward way as was the case for the L^2 priori. Therefore different methods must be used.


Gibbs sampling

Gibbs sampling is a method of reconstructing an estimate of a density function by means of a random number generator. In the case of multivariate distributions, this is done sequentially for each component at a time. The idea is best understood in terms of a two-dimensional distribution.

We assume a joint distribution D_{xy}(x, y)\, of two random variables x\, and y\,, as shown in Fig. 2, and choose an arbitrary start point (x_0, y_0)\,. Then we build a random number generator for x\, with a distribution D_{xy}(x, y_0)\, and use it to obtain a random value x_1\,. Next a second random number generator is made for y\,, obeying a distribution D_{xy}(x_1, y)\,. This generator outputs some random value y_1\,. As a result of these two steps, a new random point (x_1, y_1)\, is obtained. The process can be continued infinitely using the distributions D_{xy}(x, y_1)\, and D_{xy}(x_2, y)\, etc. As a result, a set of random points is obtained and, since the points obey the joint distribution, most of them lie close to the maximum of the distribution function. This is dispalyed schematically in Fig. 3.

In principle, it would be possible to map the joint distribution function by collecting a very large number of random points by means of Gibbs sampling. However, it is sufficient to get a farily small number of points if we, instead of searching for the maximum point, are satisfied in finding the mean value. If the distribution is nearly symmetric these two quantities are also nearly the same.

In our case of stochastic inversion, Gibbs sampling is applied to the a posteriori density. An arbitrary initial vector \mathbf{x}^{(0)}\, is first selected. Then its first component \mathbf{x}^{(0)}_1\, is replaced by a new value \mathbf{x}^{(1)}_1\, , obtained by means of a one-dimensional random number generator obeying a distribution function proportional to D_{\mathbf{x}|\mathbf{m}}((x_1, x^{(0)}_2 , x^{(0)}_3 , \dots)^T |\mathbf{m})\,. The next step is to get a new value x^{(1)}_2\, for the second component of \mathbf{x}\, using a random number generator with a distribution function proportional to D_{\mathbf{x}|\mathbf{m}}((x^{(1)}_1 , x_2, x^{(0)}_3 ,\dots)^T |\mathbf{m})\,. Following this procedure successively for all components of \mathbf{x}\, will finally lead to a new vector \mathbf{x}^{(1)}\,. The whole process is then repeated with x^{(1)}\, as a starting point and, as a result, a second new vector x^{(2)}\, is obtained. In this manner a sequence of random vectors x(k), k = 0, 1, 2,\dots\, with a distribution D_{\mathbf{x}|\mathbf{m}}(\mathbf{x}|\mathbf{m})\, can be generated. After the sequence is long enough, the mean value of the random points is calculated.

Gibbs sampling offers a convenient way of implementing the non-negativity constraint. The method is simply to reject the result when a random number generator gives a negative value and to try again until a positive value is obtained. This is equivalent to multiplying the a posteriori density by a step function which is unity in the region where all components of $x\,$ are positive, but zero elsewhere.

Gibbs sampling is useful if the maximum of the a posteriori density is sharp enough in all directions in the result space. If the density has an elongated ridge in some direction, a very large number of random points may be needed and the time for computing the task may be excessive.


Other Lessons and Main Page


Personal tools