We seek to develop a system which can model both the appearance and behaviour of a person's face. We would like to be able to present the system with a sufficiently long training sequence of an individual speaking, moving their head and changing expression, and have the system learn a model capable of simulating their behaviour. Such a system would be useful for many applications from computer games to the generation of believable avatars for human-computer interaction. This paper describes a prototype of such a system and demonstrates its performance at learning relatively simple facial behaviours.
In this paper we will concentrate on relatively low-level behaviour (how a person tends to shake their head or the particular way they smile) rather than more high-level behaviours (such as when they smile or the order in which they tend to perform actions). We assume these low-level behaviours are characterised by relatively short time scales and are repeated sufficiently often in a training sequence that we can recognise them and model their variability. Implicit in the work is the assumption that people do not repeat any action exactly (no-one smiles the same way twice), but that it is possible to learn a distribution representing the variations on a particular action that an individual tends to make.
We model the appearance of the individual using a statistical appearance model that combines shape and texture [5], and assume that the input sequence can be tracked sufficiently accurately (in practice using an active appearance model [4]). The sequence can then be represented as a series of points forming a trajectory through the parameter space of the appearance model. A challenging step is then to analyse this trajectory, automatically breaking it down into sub-units which correspond to distinct actions, and to model these actions. We present a novel approach to this, in which we locate nodes in space at points of high density and use these to split the trajectory into segments, which are then grouped and the groups modelled. A variable length Markov model is then trained to learn the relationships between the groups. This allows us to synthesise novel paths through the groups and thus novel sequences which capture the behaviour observed in the training set.
In the following we review related work, describe the system in more detail and show the results of experiments.