Performance given a large training set

Next: Performance given a small Up: Quantitative assessment of the Previous: Quantitative assessment of the Index

Performance given a large training set

In this section, the learning text is 70000 letters long. The learnt VLMM has been tested on 3 texts:

E1 - the training text (70000 characters).
E2 - a text similar in style and content (5218 characters).
E3 - a completely different text (2144 characters).

The prediction capabilities (percentage of letters correctly predicted) are reported on table 6.1. The results show similar performances on each text, but the Matusita measure gives markedly better predictions than the KL divergence (note that a flat prior would achieve about $\frac{1}{\vert\Sigma\vert} \approx 3 \%$ ).

Table 6.1: Comparison of the prediction of a VLMM learnt from a long text. The VLMM has been trained using the text


KL	$25.7\%$	$26.4\%$	$24.9\%$
Matusita	$32.9\%$	$34.8\%$	$32.7\%$

The small differences between the percentage of correct guesses of the three different texts tends to suggest that the VLMM did not ``over learn'' the text, as its performance on the quite dissimilar test set was close to that on the training set .

Next: Performance given a small Up: Quantitative assessment of the Previous: Quantitative assessment of the Index

franck 2006-10-01