Results of the psychophysical experiment

Of the 112 people who volunteered, only 43 completed all the questions. Technical problems and the length of the experiment deterred the rest. In the following we present an analysis based on the 43 complete surveys.

Figure 8.9 shows a graph of the time used to answer the questions. We can see three different types of response on this graph:

**Figure 8.9:** Time taken to answer the questions in the experiment. One dotted line corresponds to the time taken by the video. The other one corresponds to twice that time.
$\includegraphics[width=145mm,keepaspectratio]{timegraph.eps}$

Since most answers belong to the two last categories, it appears that the experiment has been done carefully by the persons who took the time to complete it.

When comparing two videos clips both extracted from the original sequence, people tend to choose the video on the right rather than the video on the left. Indeed, 72 videos have been selected on the left while 99 videos have been selected on the right. This gives a significant bias towards the right (the probability being 0.9535).

The bias on the selected side is also present if we use all the questions. This bias is the main reason why we have only kept answers from the people completing the whole experiment. The side on which the generated videos appears should be balanced to compensate for the bias introduced in the case of random choice. If the experiment is completed, each pair of models is displayed an equal number of times on each side.

The results of the experiments can be seen on table 8.4. The table shows the number of answers given for each pair of models, for all the videos and all the volunteers. It shows that for each model, people were able to distinguish between the original video sequence and the generated ones. There is still room for improving any of the models. This result is in agreement with the results of Hack who did a similar experiment [41] and found that no model so far tested in his experiment can confound the volunteers.

The results of our experiment also show that our model performs better than the autoregressive process if the linear model of residuals is used to smooth the output. We cannot conclude anything when comparing the other possible pairs of models since the results are not significant (cases annotated by a $\star$ in table 8.4).

Table 8.4: Psychophysical experiment answers' summary. WR (respectively WOR ) is our model with (respectively without) a linear residual model. ARP is the autoregressive process. The results are reported for the whole set of questions. Each cell represents the number of answers selecting the model in the column as being more realistic than the model in the row. A star ( $\star$ ) has been added to non-significant results.

	Original	WR	WOR	ARP
Original		121 ( $35\%$ )	125 ( $36\%$ )	86 ( $25\%$ )
WR	223 ( $65\%$ )		168 ( $49\%$ ) $\star$	124 ( $36\%$ )
WOR	219 ( $64\%$ )	176 ( $51\%$ ) $\star$		158 ( $46\%$ ) $\star$
ARP	258 ( $75\%$ )	220 ( $64\%$ )	186 ( $54\%$ ) $\star$

Table 8.5 shows the results for video V2 (expressions). It shows similar results. We can also conclude that, for video V2, our model is also significantly better than the autoregressive process even when we do not use the linear model of residuals.

Table 8.5: Psychophysical experiment answers' summary for video V2. WR (respectively WOR ) is our model with (respectively without) a linear residual model. ARP is the autoregressive process. Each cell represents the number of answers selecting the model in the column as being more realistic than the model in the row. A star ( $\star$ ) has been added to non-significant results.

	Original	WR	WOR	ARP
Original		58 ( $34\%$ )	66 ( $38\%$ )	25 ( $15\%$ )
WR	114 ( $66\%$ )		91 ( $53\%$ ) $\star$	44 ( $26\%$ )
WOR	106 ( $62\%$ )	81 ( $47\%$ ) $\star$		69 ( $40\%$ )
ARP	147 ( $85\%$ )	128 ( $75\%$ )	103 ( $60\%$ )

Table 8.6 shows the results for video V3 (dialog). This time, only the original videos have been successfully spotted by the volunteers. There are no significant differences between any other pairs of models. However, the results suggest that our model does not perform worse than the autoregressive process when we use the linear model of residuals.

Table 8.6: Psychophysical experiment answers' summary for video V3. WR (respectively WOR ) is our model with (respectively without) a linear residual model. ARP is the autoregressive process. Each cell represents the number of answers selecting the model in the column as being more realistic than the model in the row. A star ( $\star$ ) has been added to non-significant results.

	Original	WR	WOR	ARP
Original		63 ( $37\%$ )	59 ( $34\%$ )	61 ( $35\%$ )
WR	109 ( $64\%$ )		77 ( $45\%$ ) $\star$	80 ( $47\%$ ) $\star$
WOR	113 ( $66\%$ )	95 ( $55\%$ ) $\star$		89 ( $52\%$ ) $\star$
ARP	111 ( $65\%$ )	92 ( $53\%$ ) $\star$	83 ( $48\%$ ) $\star$