Next: Structure of the model Up: Modelling Facial Behaviours Previous: Introduction Index

Related past work

Bregler, in [2], uses a hierarchical framework to recognise human dynamics. His framework can be decomposed into four steps: the raw sequence, a model of movement using a mixture of Gaussians, a model of linear dynamics and a model of complex movements using a hidden Markov model. He highlighted the need of high level information for a correct model of behaviour.

Brand et al., in [1], describes a model of interaction called coupled hidden Markov model. Different behaviours are encoded using separate states for two interlocutors. Each state depends on all the previous states, that is the previous states of both interlocutors. He develops an efficient learning algorithm and shows that this model outperforms classical models such as hidden Markov models.

In [8], shapes are approximated by splines. The parameters controlling those splines as well as their speed are first clustered into prototype vectors using a competitive learning neural network. A compressed sequence derived from the prototype vector sequence is learnt using a Markov chain. A cubic Hermite interpolation is used along with the learnt Markov chain to recover the temporal structure of the sequence before compression and to extrapolate a behaviour. Furthermore, for generation purposes, a single hypothesis propagation and a maximum likelihood framework are described. During the generation, states of the Markov chain are chosen according to the state of the shape of a tracked person. This can allow generation of a shape of a virtual partner driven by a tracked real person. In [6], Devin and Hogg added sound and appearance to the framework in order to demonstrate that producing a talking head is possible. [7] introduces the use of variable length Markov model with the prototype vectors to learn the structure of the sequence.

In [10], Walter et al. model gestures by groups of trajectory segments. The trajectory segments are extracted by detecting discontinuities in the gesture trajectory. After normalising the trajectory segments, their dimensions are reduced using a principal component analysis. Clusters are then extracted from the component space using an iterative algorithm based on minimum description length. The clusters form atomic gesture components. There is a parallel between groups of trajectory segments and the actions or visual units we want to extract from the video sequence. However our segmentation and grouping algorithms are both different.

Finally, the experiments of Martin et al. [9] suggest that it is possible to recognise face expressions from their trajectories in the appearance parameter space we use in our model. Thus, our model should be able to generate different expressions.

Next: Structure of the model Up: Modelling Facial Behaviours Previous: Introduction Index

franck 2006-10-01