Variable length Markov models provide compact models of sequences that have the same characteristics as traditional Markov models. In particular we can generate stochastic synthetic sequences from a model. In order to do that, we only have to sample a new element after the sequence from the distribution given by the set of probabilities:
In order to be able to compare similar measurements, we need to use the same observed history for each probability computed for a prediction. It is not meaningful to compare probabilities of sequences that encode different lengths of history. We set the size of
to a predefined value. The sequences should be long enough to take all the useful information into consideration. A good value would be
, the maximum depth of the VLMM tree generated by the learning algorithm.
If the sequence is encoded in the VLMM tree, the corresponding probability is given by the vector associated with the parent node (the last node of the sequence
). For instance, the probability of having the sequence
in figure 6.2(b) is given by the second value of the vector associated with
, that is 0.072. But what about the sequence
? Indeed, this sequence is not represented in the tree, so we cannot find the probability directly in one of the vector of the tree. In that case, we cannot really compute the value of the probability because we do not have enough information to do so. However, we can assume that the probabilities of the missing elements are uniform. Elements are missing because either:
If we require a longer generated trajectory, we can continue adding pathlet states to the sequence in the same way to get a sequence of pathlet states
of length
.
This algorithm needs initialising with a sequence of states. One way of doing it can be to assume a null history at the beginning. In practice, we just generate a random sequence of states of size . We then add
states to that sequence and remove the
first states, giving us the sequence
. We then only use the last
elements of
to add the states to the sequence. If we choose
to be the maximum number of levels in the VLMM tree, all the other elements of
will be associated with a uniform probability since they will not appear in the VLMM tree
. This is simpler to implement in practice since the history has a constant size. The states added to
at the beginning are not very likely but become more and more likely as the last
elements of
form a likely sequence of states.
Note that another possible generation method involves sampling directly branches of the VLMM tree. Even if it is a cheap way a generating sequences, it is not correct because it does not take the history into account. Histories should be used when the VLMM is used for prediction. We want to generate behaviour, so we need to look at the history for every element generated. Otherwise the generated sequence will look like small pieces of behaviour concatenated one after the other, without any natural link.