The estimation of observed probabilities

Next: Laplace's law of succession Up: Training a VLMM Previous: The algorithm Index

The estimation of observed probabilities

One problem that arises when one deals with finite sequences is that of estimating the probabilities. The true underlying probability is not known. The probabilities must be estimated. We investigated several ways of estimating probabilities from sequences of letters (or pathlet models respectively). The details of the different laws of succession mentioned here can be found in [80].

We denote by the number of times the sequence is observed as being a subsequence of the training sequence. The training sequence is supposed to represent the population of sequences we will have to deal with, that is samples from the probability distribution we want to model. We denote by the number of possible subsequences of size $\vert s\vert$ in the training sequence, that is:

$\begin{displaymath}N_s=\sum_{s' \in \Sigma^{\vert s\vert}}n_{s'}\end{displaymath}$

Note that $N_s=L+1-\vert s\vert$ , where

is the size of the training sequence.

Furthermore, the conditional probability $P(\sigma \vert s)$ is by definition:

$\begin{displaymath}P(\sigma \vert s)=\frac{P(s \sigma)}{P(s)}\end{displaymath}$

We only need to estimate the probabilities of a subsequence

within a training sequence. The different laws of succession give us such estimates.

Subsections

Next: Laplace's law of succession Up: Training a VLMM Previous: The algorithm Index

franck 2006-10-01