General description

In mathematical terms, a neural network is simply a function with some parameters called weights. This function can be build using a simple element called a neuron. A neuron can be represented by the figure 2.8.

**Figure 2.8:** Model of a neuron. The output is calculated using some parameters $\left\{c_i\right\}_{i\in\left[1,p\right]}$ and a function : $z=g\left(x_1,x_2,\ldots,x_n,c_1,c_2,\ldots,c_p\right)$ .
$\begin{figure}\begin{center} \epsfbox{neuronef.ps} \end{center} \end{figure}$

The function

is composed of a linear combination of inputs which is called potential and an activation function

. The potential $\nu$ of a neuron is defined by:

$\begin{displaymath}\nu=c_0+\sum_{i=1}^nc_i\cdot x_i\end{displaymath}$

where

is a constant called bias. The bias can be introduced into the sum by introducing a new input of the neuron whose value is always 1. The bias then becomes the parameter associated with this value. So we can write:

$\begin{displaymath}\nu=\sum_{i=0}^nc_i\cdot x_i\end{displaymath}$

$\begin{displaymath}z=f(\nu)=f\left(\sum_{i=0}^nc_i\cdot x_i\right)\end{displaymath}$

The activation function used is the hyperbolic tangent function or sigmoid. Other sigmoid functions such as $x\mapsto\frac{1}{1+e^{-x}}$ could be used as activation functions.

These neurons can be combined into a network by providing inputs for neurons using the outputs of other neurons. Such a network is called a neural network. So a neural network can have different number of inputs or outputs, and different architectures.

We are particularly interested in an architecture called the multi-layer Perceptron, which figure 2.9 shows. It is composed of

layers,

. Each layer is fully connected to the next layer. The output layer has a linear activation function. The other neurons have a sigmoidal activation function.

**Figure 2.9:** Model of a neural network. The output layer uses a linear activation function and the other neurons use a sigmoid activation function. Each layer is fully connected to the next layer.
$\begin{figure}\begin{center} \epsfbox{mlpfig.ps} \end{center} \end{figure}$

We also need a way to assess the performance of the learned mapping. The evaluation of the error is done using the sum-of-squares error function over a set

. The set

is a set of pairs. Each pair is composed of an input and the corresponding desired value of the output (target) of the neural network. The sum-of-square function is:

$\begin{displaymath}E_T(w)=\frac{1}{N}\sum_{i=1}^NE_i(w)\end{displaymath}$

$\begin{displaymath}E_i(w)=\left(d_i-z_i(w)\right)^2\end{displaymath}$

This error function has to be minimized with respect to

according to a learning set. The result of this minimization gives us a set of weights which form a trained network. For each regular function, we can find a neural network that can fit this function with an arbitrary precision.

We can assess the performance of the training using a test set which is different from the learning set. The computation of the sum-of-square error over this test set gives us a measure of the performance of the neural network.