The natural law of succession

Next: Comparison of the probability Up: The estimation of observed Previous: Lidstone's law of succession Index

The natural law of succession

The natural law of succession presented in [51] is a more recent law of succession. It is based on the fact that simple sequences are more probable than complex sequences. The probabilities are then estimated using a more appropriate subset of the alphabet. Indeed, alphabets are large in general, and so natural sequences do not include all the element of the alphabet. That is why a new constant is introduced :

$\begin{displaymath}q=\left\vert\left\{s\vert s \in \Sigma^*, n_s>0\right\}\right\vert\end{displaymath}$

It represents the number of possible sub-sequences, found in the observed sequence. The formula can then be derived as follow :

$\begin{displaymath}P(s)=\left\{\begin{array}{ll} \frac{n_s+1}{N_s+\vert\Sigma\ve... ...a\vert-q)(N_s^2+N_s+2q)} & {\rm otherwise} \end{array}\right. \end{displaymath}$

It has been proved both in theory and in practice that this law of succession outperforms the previous ones.

Unfortunately, for a practical point of view, the natural law of succession is too computationally expensive. Indeed, the computation of the formula requires to compute . The computation of is done by counting the size of all sets of similar subsequences of any size in the observed sequence. Furthermore, the variable length Markov model learning algorithm uses extensively the estimation of observed probabilities. So we cannot afford a lost of performance for this estimation. That is why this law of succession will not be used and assessed in the rest of this thesis.

Next: Comparison of the probability Up: The estimation of observed Previous: Lidstone's law of succession Index

franck 2006-10-16