Humans are social animals, and communication underpins our society. The most natural way of interacting with anyone is face-to-face. It has thus long been a goal of research into human computer interaction to be able to mimic this face-to-face communication.
Raudsepp [78] has pointed out that in a dialog, 7% of the message conveyed is verbal, 38% is vocal and 55% lies in the body language. The verbal part of a conversation is the content of the conversation itself. The vocal part is the paralanguage, that is the tone of voice, intonation, pauses and sights. The body language consists of posture, distance maintained with the speaker, eye contact, gestures and facial expression. The importance of eye contact in a conversation suggests that a natural human-computer interface should display a picture of a face on the screen.
Gestures have been extensively studied by the machine vision community in order to enable computers to understand humans using their natural way of expressing themselves. Results of these studies can provide an elegant alternative to input devices used with computers, especially for interaction in 3D environments.
More recently, facial expressions have been studied. Classification or assessment of facial expressions from a video sequence can be used by computers to understand how the user feels and to react in an adapted manner to his emotional state. Indeed, even if it is often difficult for a human to detect an expression on the face of a person, the face reflects most of our emotions. Many of the studies in facial expression classification concentrate on extracting the six basic emotions from images or sequences of images: happiness, sadness, surprise, fear, anger and disgust. A trained human is able to distinguish these emotions using facial clues with an error average of 13% [75].
Building a human-computer interface based on visual clues requires several stages:
Unfortunately, such an ideal human-computer interface is still in its infancy. In order to achieve such a complex task, a good model of facial behaviour is needed. This thesis describes one approach to creating such a model.