The construction of the tree is iterative and begins with the root node. An empty string is assigned with the root node. We then add candidates nodes up to a depth
. If the probability of a string is small then the node should not be in the tree. We decide whether a probability is small by comparing it with a predefined threshold that represents the accuracy of the model. This threshold is defined by the user. We denote it by
.
There is another reason not to add a node in the tree: if the addition of the node gives a new tree that is statistically equivalent to the previous one. We need a way of checking if two distributions of probability are equivalent. In section 6.3.4 we describe the different measures of distance between probabilities that we have investigated and how they behave. For now, we denote by a measure of the distance between two probability measures
and
over the observation space.
The distance can measure the amount of additional information we gain by adding the suffix
to the tree instead of pruning the node
and keeping the suffix
to describe the distribution. Given this measure of distance between two probabilities
and
, we can construct a statistical error measure by weighting it with the prior probability of observing the suffix
. Indeed, the distance
can be large even if the probability of observing the sequence
is very small. Weighting the distance by
allows us to neglect those cases. The statistical difference is then measured by