Different approaches have been used in the literature to track deformable objects through a video sequence using the active appearance model or derived models. Such approaches include the work of:
The active appearance model is a form of gradient descent, so converges to the nearest local minima. When tracking a video sequence it is usually sufficient to initialise the model with the result of the search in the previous frame. For relatively slow motions this is close enough to convergence, leading to accurate tracking. However, occasionally the movement between frames is sufficiently large that the active appearance model fails to follow the face, and falls into a non-optimal local minima.
To avoid this problem, we initialise the model at multiple starting points. Those starting points correspond to nodes on a grid placed on the image. The nodes are separated by 8 pixels, this distance being within the usual range of convergence [21]. The active appearance model search algorithm is performed for each initialisation, the result giving the smallest magnitude of the residuals is chosen. This has been found to lead to reliable tracking.