Variable length Markov models provide a compact model of the sequence that has the same characteristics that traditional Markov models. In particular we can generate stochastic synthetic sequences from the model. In order to do that, we only have to sample a new element after the sequence from the distribution given by the set of probabilities :
In order to be able to compare similar measurements, we need to use the same observed history for each probability computed for a prediction. Indeed, it does not make sense to compare probabilities that encode different length of history. So we need to set the size of
to a predefined value. The value should be long enough to take all the useful information into consideration. A good value would be
, the maximum depth of the VLMM tree generated by the learning algorithm.
If the sequence is encoded in the VLMM tree, the corresponding probability is given by the vector associated with the parent node (last node of the sequence
). For instance, the probability of having the sequence
in figure 5.1(b) is given by the second value of the vector associated with
, that is 0.072. But what about the sequence
? Indeed, this sequence is not represented in the tree, so we cannot find the probability directly in one of the vector of the tree. In that case, we cannot really compute the value of the probability because we do not have enough informations to do so. However, we can assume that the probabilities of the missing elements are uniform. Indeed, the element are missing because
Note that another possible generation consist of sampling directly branches of the VLMM tree. Even if it is a cheap way a generating sequences, it is not the right way of generating because it does not take the history into account. It is obvious that histories should be used when the VLMM is used for tracking objects. But even if we only want to generate behaviour, we need to look at the history for every element generated. Otherwise the generated sequence will look like small pieces of behaviour concatenated one after the other, without any link.