next up previous index
Next: Quasi-Newton algorithm Up: Learning algorithms Previous: Learning algorithms   Index



Gradient descent

The method of gradient descent is one of the most fundamental procedures for the minimization of a function $E$ of several variables. In order to find a minimum, we begin at random values for the variables, which gives us a point $w_0$ in the original space. We can then choose another point by moving in a direction $d$. The new point must have an image that is smaller then the image of the previous point. We then move again in another direction that decrease the value of the function to find another point, and so on.

Using this idea we keep decreasing the value of the function until we found a minimum. For each iteration, the first step is to find the right direction. We have to choose a descent direction. As we know that the opposite of the gradient is the steepest descent direction, we can choose the direction $d_k$ for each point $w_k$ as follows:

\begin{displaymath}d_k=-\nabla E\left(w_k\right)\end{displaymath}

A second step, which is harder to find, is the length of step along each direction $d_k$. If the step is too big, the algorithm can diverge. If it is too small, the algorithm takes a long time to converge. Some choices like a step of $\frac{1}{k\cdot \left\Vert\nabla E\left(w_k\right)\right\Vert}$ are proved to converge [18]. However, these methods are often slow. Here, our goal is only to get quickly close to a minimum. So we can take a step of $\frac{\eta}{\left\Vert\nabla E\left(w_k\right)\right\Vert}$ where $\eta$ is a constant. With this step, we are also sure that the algorithm will not diverge until a big number of steps because the distance between two points is less than $1$.

So the algorithm becomes:

Choose a random point $x_0$.

For $k$ from $1$ to a fixed number of iterations:

\begin{displaymath}w_{k+1}=w_k-\eta\frac{\nabla E\left(w_k\right)}{\left\Vert\nabla E\left(w_k\right)\right\Vert}\end{displaymath}


next up previous index
Next: Quasi-Newton algorithm Up: Learning algorithms Previous: Learning algorithms   Index

franck 2006-10-15