next up previous index
Next: Statistical model of appearance Up: Active appearance model Previous: Introduction   Index



Statistical shape model

The first step in modelling an object is to model of the shape of the object. This is the task of the statistical shape model. It aims at modelling shape of objects given a training set. This training set is composed of hand labelled images. The hand labelling is an efficient way of including human knowledge into a learning mechanism. Figure 3.1 shows an example of hand labelled image. The shape is described by a vector ${\bf x}$ that contains the coordinates of each points of the shape.

Figure 3.1: Example of hand labelled face. The points are manually put on features such as edges.
\begin{figure}\begin{center}
\epsfbox{anot.eps}
\end{center}
\end{figure}

The first step of the statistical shape model is to align the shapes found in the training set. This is done by an approach called Procrustes analysis [21]. This algorithm is iterative and reduces the sum of the distances between each shape to the mean shape. Thus in the training set, all the shape have the same center of gravity, scale and direction.

The variabilities between the shape are then estimated by applying a principle component analysis (PCA) to the vectors representing the aligned shapes. The mean of these $s$ vectors is computed :

\begin{displaymath}\overline{\bf x} = \frac{1}{s} \sum_{i=1}^s {\bf x}_i\end{displaymath}

as well as the variance of the data :

\begin{displaymath}{\bf S}=\frac{1}{s-1} \sum_{i=1}^s ({\bf x}_i-\overline{\bf x})({\bf x}_i-\overline{\bf x})^T\end{displaymath}

and finally the eigenvectors $\phi_i$ of ${\bf S}$ are computed with their associated eigenvalues $\lambda_i$.

In order to decrease the dimensionality of the data, the $t$ largest eigenvalues are chosen so that it explains most of the variation of the dataset. A threshold $f_v$ is previously chosen (usually $95\%$ or $98\%$). $t$ is then computed by taking the minimum integer where the equation :

\begin{displaymath}\sum_{i=1}^t\lambda_i\geq f_v \sum_{i=1}^s\lambda_i\end{displaymath}

holds.

If we define $\Phi=\left(\phi_1,...,\phi_t\right)$, each vector ${\bf x}$ in the training set can be approximate by :

\begin{displaymath}{\bf x} \approx \overline{\bf x} + \Phi {\bf b}\end{displaymath}

where ${\bf b}$ is defined by :

\begin{displaymath}{\bf b}=\Phi^T \left({\bf x}-\overline{\bf x}\right)\end{displaymath}

${\bf b}$ describes the shape ${\bf x}$. The approximation of the shape ${\bf x}$ can be reconstructed only with ${\bf b}$, given that we know the model (that is, $\overline{\bf x}$ and $\Phi$). By varying ${\bf b}$ of an amount $\lambda_i$, the shape ${\bf x}$ varies as the variance observed in the training set. Constraining the model to small variations allows the model to generate, only shapes that are similar to the training shapes. This can be done either by restricting the elements $b_i$ of ${\bf b}$ to vary between the bounds $\pm 3 \sqrt{\lambda_i}$ or by constraining ${\bf b}$ to be in a hyper-ellipsoid:

\begin{displaymath}\sum_{i=1}^t\frac{b_i^2}{\lambda_i}\leq M_t\end{displaymath}

where $M_t$ is a threshold chosen using the $\chi^2$ distribution.

Figure 3.2a shows the first mode of variation of the model built on images of annotated faces.

Figure 3.2: Example of the first mode of variation of the shape of a face. The parameter of ${\bf b}$ corresponding to the larger eigenvalue, varies from $-3\sqrt{\lambda_1}$ to $3\sqrt{\lambda_1}$.
\begin{figure}\begin{center}
\epsfbox{mode1.eps}
\end{center}
\end{figure}


next up previous index
Next: Statistical model of appearance Up: Active appearance model Previous: Introduction   Index

franck 2006-10-16