next up previous index
Next: Conclusion Up: Variable length Markov model Previous: The Matusita distance   Index



Prediction using VLMM

Variable length Markov models provide a compact model of the sequence that has the same characteristics that traditional Markov models. In particular we can generate stochastic synthetic sequences from the model. In order to do that, we only have to sample a new element after the sequence $s$ from the distribution given by the set of probabilities :

\begin{displaymath}\left\{\tilde{P}(\sigma\vert s)\vert\sigma \in \Sigma\right\}\end{displaymath}

where every elements are proportional to the probabilities in the set :

\begin{displaymath}\left\{\tilde{P}(s \sigma)\vert\sigma \in \Sigma\right\}\end{displaymath}

A maximum likelihood generation is also possible by taking the element that maximize the probability $\tilde{P}(\sigma\vert s)$.

In order to be able to compare similar measurements, we need to use the same observed history $s$ for each probability computed for a prediction. Indeed, it does not make sense to compare probabilities that encode different length of history. So we need to set the size of $s$ to a predefined value. The value should be long enough to take all the useful information into consideration. A good value would be $L$, the maximum depth of the VLMM tree generated by the learning algorithm.

If the sequence $s \sigma$ is encoded in the VLMM tree, the corresponding probability is given by the vector associated with the parent node (last node of the sequence $Suffix(s) \sigma$). For instance, the probability of having the sequence $\{B,B,A\}$ in figure 5.1(b) is given by the second value of the vector associated with $B_2$, that is 0.072. But what about the sequence $\{B,C,A\}$ ? Indeed, this sequence is not represented in the tree, so we cannot find the probability directly in one of the vector of the tree. In that case, we cannot really compute the value of the probability because we do not have enough informations to do so. However, we can assume that the probabilities of the missing elements are uniform. Indeed, the element are missing because

-
either they have not been observed in this context in the training sequence, in that case we cannot know their real probabilities and we have to decide an empirical probability
-
or they have been pruned during the training algorithm, because they did not bring any valuable information, and thus allow us to chose an uniform probability in order to compare them to elements that might have brought information.
So, in order to compute the probability of the sequence $\{B,C,A\}$, we multiply the probability $\frac{1}{\vert\Sigma\vert}$ of having the element $B$ followed by the sequence $\{C,A\}$ by the probability of having the sequence $\{C,A\}$ (0.18 in our case).

Note that another possible generation consist of sampling directly branches of the VLMM tree. Even if it is a cheap way a generating sequences, it is not the right way of generating because it does not take the history into account. It is obvious that histories should be used when the VLMM is used for tracking objects. But even if we only want to generate behaviour, we need to look at the history for every element generated. Otherwise the generated sequence will look like small pieces of behaviour concatenated one after the other, without any link.


next up previous index
Next: Conclusion Up: Variable length Markov model Previous: The Matusita distance   Index

franck 2006-10-16