next up previous index
Next: Conclusion Up: Quantitative assessment of the Previous: Performance given a large   Index



Performance given a small training set

The test was repeated training on the shortest text $E_3$. Results are reported in tables 6.2 and 6.3 for the Matusita distance and the KL divergence respectively.


Table 6.2: Comparison of the prediction of a VLMM learnt from a short text using the Matusita distance. The VLMM has been trained using the text $E_3$. It has been tested on the three texts $E_1$, $E_2$ and $E_3$. The results are reported for Lidstone's law of sucession for different values of $\lambda$. The case $\lambda=0$ corresponds to the maximum likelihood estimate. The case $\lambda=1$, which corresponds to Laplace's law of sucession, completely failed since it only predicted spaces.
$\lambda$ $E_1$ $E_2$ $E_3$
0 $20.2\%$ $22.1\%$ $30.8\%$
0.05 $23.7\%$ $25.5\%$ $35\%$
0.1 $24.0\%$ $24.8\%$ $32.7\%$
1 $16.6\%$ $16.5\%$ $18.2\%$



Table 6.3: Comparison of the prediction of a VLMM learnt from a short text using the Kullback-Leibler divergence. The VLMM has been trained using the text $E_3$. It has been tested on the three texts $E_1$, $E_2$ and $E_3$. The results are reported for Lidstone's law of sucession for different values of $\lambda$. The case $\lambda=0$ corresponds to the maximum likelihood estimate. The case $\lambda=1$ corresponds to Laplace's law of sucession.
$\lambda$ $E_1$ $E_2$ $E_3$
0 $20.9\%$ $23\%$ $31.6\%$
0.5 $28.3\%$ $30.1\%$ $52.4\%$
1 $28.3\%$ $30.1\%$ $52.4\%$


The method using the KL divergence performs better this time. Associated with the Laplace's law of succession, it is able to predict about $30\%$ of a text dealing with another subject ($E_1$ and $E_2$).

In the case of the Matusita distance, the best performance is achieved with $\lambda =0.05$. It seems that this time, the maximum likelihood estimate could not explain the data on its own. A small amount of uniform prior was important. This is due to the fact that there were insufficient training data. We cannot trust the frequencies counted in the training sequence as we could with the training on a large dataset, so we need to include a part of uniform estimation which is done by increasing the value of $\lambda$.


next up previous index
Next: Conclusion Up: Quantitative assessment of the Previous: Performance given a large   Index

franck 2006-10-01