Comparison of trees built using the Matusita distance

For these experiments, the VLMM has been trained on a text (2144 letters). The training text tells the story of penguins and other animals.

Figures 6.3, 6.4, 6.5 and 6.6 represents the resulting tree when trained using the Matusita distance. The Lidstone estimation of probability has been used with $\lambda$ varying from 0 to 1. The case $\lambda=0$ corresponds to the maximum likelihood estimate and the case $\lambda=1$ correspond to Laplace's law of succession.

**Figure 6.3:** VLMM tree learnt using the maximum likelihood estimation of probability and the Matusita distance. $\epsilon$ is set to 0.003 and only the nodes with probabilities that are greater than 0.003 are shown on the graph.
$\includegraphics[width=145mm,keepaspectratio]{ranktrees/t_velb_0_0.003.eps}$

**Figure 6.4:** VLMM tree learnt using the Lidstone estimation of probability with $\lambda =0.05$ and the Matusita distance. $\epsilon$ is set to 0.003 and only the probabilities that are greater than 0.003 are shown on the graph.
$\includegraphics[width=145mm,keepaspectratio]{ranktrees/t_velb_0.05_0.003.eps}$

**Figure 6.5:** VLMM tree learnt using the Lidstone estimation of probability with $\lambda =0.1$ and the Matusita distance. $\epsilon$ is set to 0.003 and only the probabilities that are greater than 0.003 are shown on the graph.
=10cm $\includegraphics[width=125mm,keepaspectratio]{ranktrees/t_velb_0.1_0.003.eps}$

**Figure 6.6:** VLMM tree learnt using the Laplace estimation of probability and the Matusita distance. $\epsilon$ is set to 0.003 and only the probabilities that are greater than 0.003 are shown on the graph.
=5cm $\includegraphics[width=65mm,keepaspectratio]{ranktrees/t_velb_1_0.003.eps}$

We can see that the topology of the tree evolves a lot from the case $\lambda=0$ to the case $\lambda=1$ .

For the case $\lambda=1$ (figure 6.6), the depth of the tree is one. That means that the model does not take histories into account. It only models the probability of having a letter in the alphabet. We can notice that some letters like ``j'' or ``q'' are not in the tree because of their low probability in English texts, and in particular in our example text. As we have seen before, this is the problem with Laplace's law of succession because it is built on the assumption of a uniform prior.

In the case $\lambda=0$ (figure 6.3), the tree seems to have learnt the text in a more appropriate manner. We can find parts of some words like ``peng'' which comes from ``penguin'' (main subject of the text). We can also find small words such as ``the'', ``of'', ``in'', ``on'' or ``as''.

This qualitative evaluation seems to be favorable to the maximum likelihood estimation. A quantitative evaluation of these trees is given later.