Next: Possible applications Up: Introduction Previous: Introduction Index

Motivation

Humans are social animals, and communication underpins our society. The most natural way of interacting with anyone is face-to-face. It has thus long been a goal of research into human computer interaction to be able to mimic this face-to-face communication.

Raudsepp [78] has pointed out that in a dialog, 7% of the message conveyed is verbal, 38% is vocal and 55% lies in the body language. The verbal part of a conversation is the content of the conversation itself. The vocal part is the paralanguage, that is the tone of voice, intonation, pauses and sights. The body language consists of posture, distance maintained with the speaker, eye contact, gestures and facial expression. The importance of eye contact in a conversation suggests that a natural human-computer interface should display a picture of a face on the screen.

Gestures have been extensively studied by the machine vision community in order to enable computers to understand humans using their natural way of expressing themselves. Results of these studies can provide an elegant alternative to input devices used with computers, especially for interaction in 3D environments.

More recently, facial expressions have been studied. Classification or assessment of facial expressions from a video sequence can be used by computers to understand how the user feels and to react in an adapted manner to his emotional state. Indeed, even if it is often difficult for a human to detect an expression on the face of a person, the face reflects most of our emotions. Many of the studies in facial expression classification concentrate on extracting the six basic emotions from images or sequences of images: happiness, sadness, surprise, fear, anger and disgust. A trained human is able to distinguish these emotions using facial clues with an error average of 13% [75].

Building a human-computer interface based on visual clues requires several stages:

1): a tracking system that is able to locate the face of the user each time it is required. The tracking problem is challenging because of the variability of the expressions one can show. In order to be able to bring useful information to the next stages, it has to be both accurate and robust. Facial hair, glasses, occlusions and sensor noise are problems that make this task difficult.
2): the analysis of the user's face to extract information such as facial expression or the user's gaze direction. We can then combine this information to deduce the state of the user.
3): a decision of how to react to the user.
4): the synthesis of a virtual face which seems to react in a manner which appears natural.

Unfortunately, such an ideal human-computer interface is still in its infancy. In order to achieve such a complex task, a good model of facial behaviour is needed. This thesis describes one approach to creating such a model.

Next: Possible applications Up: Introduction Previous: Introduction Index

franck 2006-10-01