In this section, the learning text used was a long text of 70000 letters. It has been tested on 3 text including the learning text itself, in order to see how well the VLMM can perform when it is asked to reproduce what it has learn. Let's call this text . A another text on the same subject has been tested (
).
and
comes from the same text that has been split into a learning text and a test text. Finally another text
has been used.
is the text used to train the tree shown in section 6.3 and
is the text used to train the tree shown in section 6.2. The number of letters of each text is shown in table 6.1.
Table 6.2 shows that the use of the Matusita measure is more efficient that the choice of the KL divergence. Indeed the Matusita measure is able to guess about one letter over 3 while the KL divergence is only able to guess correctly about one letter over 4. We can also notice that the learning is more efficient on text than it is on text
. Indeed,
deals with a completely different subject, thus uses different classes of words. It is not the case of
that comes from the same lexical class as
.
|
Finally, the small differences between the percentage of correct guesses of the three different texts tends to suggest that the VLMM did not over learn the learning text. Indeed it is able to perform similarly on text .