The natural law of succession

Next: Comparison of the probability Up: The estimation of observed Previous: Lidstone's law of succession Index

The natural law of succession

The natural law of succession presented in [81] is a more complex law of succession. It is based on the fact that simple sequences are more probable than complex sequences. The probabilities are then estimated using a more appropriate subset of the alphabet. Alphabets are usually large, and so natural sequences do not include all the elements of the alphabet. The number of possible subsequences found in the observed sequence is given by:

$\begin{displaymath}q=\left\vert\left\{s\vert s \in \Sigma^*, n_s>0\right\}\right\vert\end{displaymath}$

The formula can then be derived as follow:

$\begin{displaymath}P(s)=\left\{\begin{array}{ll} \frac{n_s+1}{N_s+\vert\Sigma\ve... ...a\vert-q)(N_s^2+N_s+2q)} & {\rm otherwise} \end{array}\right. \end{displaymath}$

It has been proved both in theory and in practice that this law of succession outperforms the previous ones.

Unfortunately, from a practical point of view, the natural law of succession is too computationally expensive. Indeed, the computation of the formula requires the value of . The computation of is done by counting the number of all sets of similar subsequences of any size in the observed sequence. Furthermore, the variable length Markov model learning algorithm uses extensively the estimation of observed probabilities. We cannot afford a loss of performance for this estimation so we will not consider the law of succession in the rest of this thesis.

Next: Comparison of the probability Up: The estimation of observed Previous: Lidstone's law of succession Index

franck 2006-10-01