next up previous index
Next: Laplace's law of succession Up: Training a VLMM Previous: The algorithm   Index




The estimation of observed probabilities

One problem that arises when one deals with finite sequences is that of estimating the probabilities. The true underlying probability is not known. The probabilities must be estimated. We investigated several ways of estimating probabilities from sequences of letters (or pathlet models respectively). The details of the different laws of succession mentioned here can be found in [80].

We denote by $n_s$ the number of times the sequence $s$ is observed as being a subsequence of the training sequence. The training sequence is supposed to represent the population of sequences we will have to deal with, that is samples from the probability distribution we want to model. We denote by $N_s$ the number of possible subsequences of size $\vert s\vert$ in the training sequence, that is:

\begin{displaymath}N_s=\sum_{s' \in \Sigma^{\vert s\vert}}n_{s'}\end{displaymath}

Note that $N_s=L+1-\vert s\vert$, where $L$ is the size of the training sequence.

Furthermore, the conditional probability $P(\sigma \vert s)$ is by definition:

\begin{displaymath}P(\sigma \vert s)=\frac{P(s \sigma)}{P(s)}\end{displaymath}

We only need to estimate the probabilities of a subsequence $s$ within a training sequence. The different laws of succession give us such estimates.



Subsections
next up previous index
Next: Laplace's law of succession Up: Training a VLMM Previous: The algorithm   Index

franck 2006-10-01