next up previous index
Next: VLMM results Up: Variable length Markov model Previous: The Matusita distance   Index




Prediction using VLMM

Variable length Markov models provide compact models of sequences that have the same characteristics as traditional Markov models. In particular we can generate stochastic synthetic sequences from a model. In order to do that, we only have to sample a new element after the sequence $s$ from the distribution given by the set of probabilities:

\begin{displaymath}\left\{\tilde{P}(\sigma\vert s)\vert\sigma \in \Sigma\right\}\end{displaymath}

where every element is proportional to the probabilities in the set:

\begin{displaymath}\left\{\tilde{P}(s \sigma)\vert\sigma \in \Sigma\right\}\end{displaymath}

A maximum likelihood generation is also possible by taking the element that maximise the probability $\tilde{P}(\sigma\vert s)$.

In order to be able to compare similar measurements, we need to use the same observed history $s$ for each probability computed for a prediction. It is not meaningful to compare probabilities of sequences that encode different lengths of history. We set the size of $s$ to a predefined value. The sequences should be long enough to take all the useful information into consideration. A good value would be $L$, the maximum depth of the VLMM tree generated by the learning algorithm.

If the sequence $s \sigma$ is encoded in the VLMM tree, the corresponding probability is given by the vector associated with the parent node (the last node of the sequence $\mathit{suffix}(s) \sigma$). For instance, the probability of having the sequence $\{B,B,A\}$ in figure 6.2(b) is given by the second value of the vector associated with $B_2$, that is 0.072. But what about the sequence $\{B,C,A\}$ ? Indeed, this sequence is not represented in the tree, so we cannot find the probability directly in one of the vector of the tree. In that case, we cannot really compute the value of the probability because we do not have enough information to do so. However, we can assume that the probabilities of the missing elements are uniform. Elements are missing because either:

-
they have not been observed in this context in the training sequence, in that case we cannot know their real probabilities and we have to decide an empirical probability, or
-
they have been pruned during the training algorithm because they did not bring any valuable information. We can chose a uniform probability in order to compare them to elements that might have brought information.
So, in order to compute the probability of the sequence $\{B,C,A\}$, we multiply the probability $\frac{1}{\vert\Sigma\vert}$ of having the element $B$ followed by the sequence $\{C,A\}$ by the probability of having the sequence $\{C,A\}$ (0.18 in our case).

If we require a longer generated trajectory, we can continue adding pathlet states to the sequence $\{B,C,A\}$ in the same way to get a sequence of pathlet states $\cal{S}$ of length $n$.

This algorithm needs initialising with a sequence of states. One way of doing it can be to assume a null history at the beginning. In practice, we just generate a random sequence of states of size $n$. We then add $n$ states to that sequence and remove the $n$ first states, giving us the sequence $\cal{S}$. We then only use the last $n$ elements of $\cal{S}$ to add the states to the sequence. If we choose $n$ to be the maximum number of levels in the VLMM tree, all the other elements of $\cal{S}$ will be associated with a uniform probability since they will not appear in the VLMM tree[*]. This is simpler to implement in practice since the history has a constant size. The states added to $\cal{S}$ at the beginning are not very likely but become more and more likely as the last $n$ elements of $\cal{S}$ form a likely sequence of states.

Note that another possible generation method involves sampling directly branches of the VLMM tree. Even if it is a cheap way a generating sequences, it is not correct because it does not take the history into account. Histories should be used when the VLMM is used for prediction. We want to generate behaviour, so we need to look at the history for every element generated. Otherwise the generated sequence will look like small pieces of behaviour concatenated one after the other, without any natural link.


next up previous index
Next: VLMM results Up: Variable length Markov model Previous: The Matusita distance   Index

franck 2006-10-01