The collection of data is a key point in modelling using learning method. Indeed, an algorithm cannot learn properly on noisy data, missing data and so on. So an effort has to be made on the technical side to collect good quality data.
The quality of data in this case is not only the quality of pictures in the video, but also the quality of the reactions. If a person is afraid of the camera for instance, the experiment will not be a typical situation and thus it will not be possible to model correctly the behaviour of two persons speaking. During the collection of data, we also have to take care about other aspect cited in [48] such as eye contact and distance between the two speakers.
In order to avoid troubles to the filmed persons and to be as unobtrusive as possible, we will install two cameras in a room, each camera should point towards one persons. So, the persons will doubtlessly behave in a strange manner at the beginning of the interview, because a camera is filming them. Nevertheless, usually people usually become comfortable after a while in front of a camera. So we except the speakers will forget about the camera when getting more interested by the conversation. That is why we will film people only after a amount of time (certainly 30 minutes).