The experience has been done a second time, but this time on the text which is the smallest one. Results are reported in tables 6.3 and 6.4 for the Matusita distance and the KL divergence respectively.
|
We can see on these tables that the KL divergence performs better this time. Associated with the Laplace's law of succession, it is able to predict about of a text dealing with another subject (
and
).
In the case of the Matusita distance, the best performance is achieved with . So it seems that this time, the maximum likelihood estimate could not explain the data by its own. A small amount of uniform prior was needed in this case. This is due to the fact that there are not a lot of training examples. We cannot trust the frequencies counted in the training sequence as we could with the training on a large dataset, so we need to include a part of uniform estimation which is done by increasing the value of
.