next up previous index
Next: Learning algorithms Up: Off-line multi-layer Perceptron Previous: General description   Index



Propagation and backpropagation

The gradient of the sum-of-square error function of a multi-layer Perceptron is calculated using the backpropagation algorithm. The propagation algorithm is a preliminary of the backpropagation algorithm.

In order to explain how these two algorithms works, I have to number the neurons from left to right beginning by the layer $l_1$, then the layer $l_2$, and so on. Let $w_{ij}$ denote the weight of the link between neuron $i$ and neuron $j$. Let $F_i$ denote the set of neurons whose output is an input of the neuron $i$. $\nu_i$ is the potential of the neuron $i$, and $z_i$ is the returned value of the neuron $i$. Finally, let $N_i$ denote the set of neurons whose set of inputs contains the output of the neuron $i$.

The propagation algorithm propagates an input vector $\{x_i\}$ through the network. It is used to compute the output of the neural network knowing the inputs and the weights:

For $i$ from $1$ to the number of neurons, do:

Done.

This algorithm computes the value of the potential of each neuron. These values are used by the backpropagation algorithm which computes the gradient of the sum-of-square error. It computes $\frac{\partial E_k}{\partial w_{ij}}$ for each weights and for each example, $k$, in the training set. The algorithm is based on the following factorization:


\begin{displaymath}\forall j\in F_i, \frac{\partial E_k}{\partial w_{ji}} = \fra...
...\partial w_{ji}} = \frac{\partial E_k}{\partial \nu_i}\cdot z_j\end{displaymath}

We know that $d_k$ denotes the desired output of the neural network if $\{x_i\}$ is the input vector. For the last neuron, we have:


\begin{displaymath}\frac{\partial E_k}{\partial \nu_i}=\frac{\partial E_k}{\part...
...artial z}{\partial \nu_i}=-2\left(d_k-z\right)\cdot f_i'(\nu_i)\end{displaymath}

where $i$ is the number of a neuron located on the layer $l_p$, and $z$ is computed with the propagation algorithm. For the other neurons, we have:


\begin{displaymath}\frac{\partial E_k}{\partial \nu_i} = \sum_{j \in N_i}\frac{\...
...E_k}{\partial \nu_j}\cdot \frac{\partial \nu_j}{\partial \nu_i}\end{displaymath}

but:


\begin{displaymath}\nu_j = \sum_{i \in F_j}w_{ij}\cdot z_i = \sum_{i \in F_j}w_{ij}\cdot f_i(\nu_i)\end{displaymath}

thus:


\begin{displaymath}\forall j\in N_i, \frac{\partial \nu_j}{\partial \nu_i}=w_{ij}\cdot f_i'(\nu_i)\end{displaymath}

finally:


\begin{displaymath}\frac{\partial E_k}{\partial \nu_i} = f_i'(\nu_i)\sum_{j \in N_i}w_{ij}\cdot \frac{\partial E_k}{\partial \nu_j}\end{displaymath}

These computations can be summarized with the following algorithm :

For all couple of inputs/output in the training set, do:

Done.

Two methods were used for the minimization of the error function. In a first stage, we try to approach a minimum of the error function using a gradient descent. In a second stage, we use a Quasi-Newton algorithm to find the minimum. The Quasi-Newton algorithm is only valid in a neighbourhood of the minimum. It has to be initialized with a point close to the minimum. That is why we have to use a gradient descent before using the Quasi-Newton algorithm. The next section describes these two minimization algorithms.


next up previous index
Next: Learning algorithms Up: Off-line multi-layer Perceptron Previous: General description   Index

franck 2006-10-15