In order to be able to generate video sequences of faces, we first need an underlying model that is able to synthesise a face for each frame. Thanks to its synthesis facility, the active appearance model of Cootes et al. [4] is a perfect candidate for this task.
In order to encode each frame from the training sequence, we use a full appearance model that combines shape and texture information. After having computed the mean shape from the training set, the number of parameters of the model is reduced by applying consecutive principal component analysis to both the shape and the texture part of the model. The details of the model are described in [5]. The shape and a shape-free texture are modelled by the set of linear equations:
Given a vector of appearance parameters , the shape
can be computed. A shape-free texture
can be warped to the shape to reconstruct the full appearance of a face.
Each vector from the appearance parameter space represents a face while each facial image can be approximated by a vector in the appearance parameter space. A sequence of a face can be represented by a trajectory in the appearance parameter space. Visual units are therefore sub-trajectories within this trajectory.
![]() |
Figure 1 shows an overview of the model of facial behaviour. First, the face has to be tracked in the video sequence (1). The active appearance model parameters have then to be deduced from the tracked face (1 2). The trajectory formed by those appearance parameter vectors is then broken into sub-trajectory groups (2
3) and the sequence is now a sequence of sub-trajectory groups (3). The sequence of sub-trajectory groups is learnt (3
4) by a variable length Markov model (4).
In order to generate new trajectories, a sequence of sub-trajectory groups has to be sampled from the variable length Markov model (4 3). A new sub-trajectory has to be sampled from each group in the sequence of sub-trajectory groups (3
2) to give a sequence of sub-trajectories, that is a trajectory (2). Each point in that new trajectory in the appearance parameter space can then be synthesised (2
5) to give a video sequence of faces (5).