Next: Quantitative assessments of the Up: Variable length Markov model Previous: Comparison of learnings using Index

Comparison of the probability estimation

In order to have a correct qualitative comparison of the Matusita measure and the KL divergence, we trained a VLMM tree on a much larger text. The text used for training is now 70000 characters long and is a compilation of journalistic style articles from the news.

**Figure 6.8:** VLMM tree learned using the maximum likelihood estimation of probability and the KL divergence. $\epsilon$ is set to 0.003 and only the probabilities that are greater than 0.003 are shown on the graph.
$\begin{figure}\begin{center} \epsfbox{ranktrees/t_ve_0.003_r17_learn.eps} \end{center} \end{figure}$

**Figure 6.9:** VLMM tree learned using the maximum likelihood estimation of probability and the Matusita distance. $\epsilon$ is set to 0.003 and only the probabilities that are greater than 0.003 are shown on the graph.
$\begin{figure}\begin{center} \epsfbox{ranktrees/t_veb_0.003_r17_learn.eps} \end{center} \end{figure}$

Figures 6.8 and 6.9 show the learned trees for a VLMM using a KL divergence and a Matusita distance respectively.

We can see that the two trees look similar. Anyway, the tree learned with a Matusita distance has a depth of 6 and the KL divergence gave a tree with a depth of 3 . So the Matusita distance is able to encode more history in the tree while still not using so much nodes in the tree. It encodes small parts of words, but also small words like " said ", " and ", " the ", " of " or " to ". The KL divergence is not able to find words in the text, it only encodes parts of words.

The fact that humans split letters into words suggest that a word is a coherent sequence of information and the letters at the extremity of the word are less linked to the other words than to the word itself. This suggests that the Matusita measure is better at representing split of natural sequences. So it is also better for the learning of variable length Markov model.

Next: Quantitative assessments of the Up: Variable length Markov model Previous: Comparison of learnings using Index

franck 2006-10-16