Assessment of the prediction with a large number of data

Next: Assessment of the prediction Up: Quantitative assessments of the Previous: Quantitative assessments of the Index

Assessment of the prediction with a large number of data

In this section, the learning text used was a long text of 70000 letters. It has been tested on 3 text including the learning text itself, in order to see how well the VLMM can perform when it is asked to reproduce what it has learn. Let's call this text . A another text on the same subject has been tested (). and comes from the same text that has been split into a learning text and a test text. Finally another text has been used.

is the text used to train the tree shown in section 6.3 and is the text used to train the tree shown in section 6.2. The number of letters of each text is shown in table 6.1.

Table 6.1: Length of the test texts used.

	70000
	5218
	2144

Table 6.2 shows that the use of the Matusita measure is more efficient that the choice of the KL divergence. Indeed the Matusita measure is able to guess about one letter over 3 while the KL divergence is only able to guess correctly about one letter over 4. We can also notice that the learning is more efficient on text than it is on text . Indeed, deals with a completely different subject, thus uses different classes of words. It is not the case of that comes from the same lexical class as .

Table 6.2: Comparison of prediction of a VLMM learned from a long text. The VLMM has been trained using the text


KL	$25.7\%$	$26.4\%$	$24.9\%$
Matusita	$32.9\%$	$34.8\%$	$32.7\%$

Finally, the small differences between the percentage of correct guesses of the three different texts tends to suggest that the VLMM did not over learn the learning text. Indeed it is able to perform similarly on text .

Next: Assessment of the prediction Up: Quantitative assessments of the Previous: Quantitative assessments of the Index

franck 2006-10-16