In order to have a correct qualitative comparison of the Matusita measure and the KL divergence, we trained a VLMM tree on a much larger text. The text was a compilation of journalistic style articles from the news, 70000 characters long.
=10cm
![]() |
=10cm
![]() |
Figures 6.10 and 6.11 show the learnt trees for a VLMM using a KL divergence and a Matusita distance respectively.
We can see that the two trees look similar. The tree learnt with a Matusita distance has a depth of six and the KL divergence gave a tree with a depth of three. The method using the Matusita distance is able to encode more history in the tree while still not using so many nodes. It encodes small parts of words, but also short words such as ``said'', ``and'', ``the'', ``of'' or ``to''. The method using the KL divergence is not able to find words in the text, it only encodes parts of words.
The fact that humans group letters into words suggest that a word is a coherent sequence of information and the letters at the extremity of the word are less linked to the other words than to the word itself. This suggests that the method using the Matusita measure is better at representing groups of natural sequences.