# 01 Jan backpropagation

An algorithm that uses errors in the output (discrete differences between the correct output and the actual output) to train an artificial neural system to derive a correct output automatically.

Here’s an excerpt from Stanford Research Institute International:

## The Backpropagation Algorithm

- 1.
- Propagates inputs forward in the usual way, i.e.
- All outputs are computed using sigmoid thresholding of the inner product of the corresponding weight and input vectors.
- All outputs at stage
*n*are connected to all the inputs at stage*n*+1

- 2.
- Propagates the errors backwards by apportioning them to each unit according to the amount of this error the unit is responsible for.

We now derive the stochastic Backpropagation algorithm for the general case. The derivation is simple, but unfortunately the book-keeping is a little messy.

- input vector for unit
*j*(*x*_{ji}=*i*th input to the*j*th unit) - weight vector for unit
*j*(*w*_{ji}= weight on*x*_{ji}) - , the weighted sum of inputs for unit
*j* *o*_{j}= output of unit*j*( )*t*_{j}= target for unit*j**Downstream*(*j*) = set of units whose immediate inputs include the output of*j**Outputs*= set of output units in the final layer

Since we update after each training example, we can simplify the notation somewhat by imagining that the training set consists of exactly one example and so the error can simply be denoted by *E*.

We want to calculate for each input weight *w*_{ji} for each output unit *j*. Note first that since *z*_{j} is a function of *w*_{ji} regardless of where in the network unit *j* is located,

Furthermore, is the same regardless of which input weight of unit *j* we are trying to update. So we denote this quantity by .

Consider the case when . We know

Since the outputs of all units are independent of *w*_{ji}, we can drop the summation and consider just the contribution to *E* by *j*.

Thus

Stanford Research Institute International

Now consider the case when *j* is a hidden unit. Like before, we make the following two important observations.

- 1. For each unit
*k*downstream from*j*,*z*_{k}is a function of*z*_{j} - 2. The contribution to error by all units in the same layer as
*j*is independent of*w*_{ji}

We want to calculate for each input weight *w*_{ji} for each hidden unit *j*. Note that *w*_{ji} influences just *z*_{j} which influences *o*_{j} which influences each of which influence *E*. So we can write

Again note that all the terms except *x*_{ji} in the above product are the same regardless of which input weight of unit *j* we are trying to update. Like before, we denote this common quantity by .

Also note that , and . Substituting,

Thus,

Stanford Research Institute International

We are now in a position to state the Backpropagation algorithm formally.

**Formal statement of the algorithm:**

Stochastic Backpropagation(training examples, , *n*_{i}, *n*_{h}, *n*_{o})

Each training example is of the form where is the input vector and is the target vector. is the learning rate (e.g., .05). *n*_{i}, *n*_{h} and *n*_{o} are the number of input, hidden and output nodes respectively. Input from unit *i* to unit *j* is denoted *x*_{ji} and its weight is denoted by *w*_{ji}.

- Create a feed-forward network with
*n*_{i}inputs,*n*_{h}hidden units, and*n*_{o}output units. - Initialize all the weights to small random values (e.g., between -.05 and .05)
- Until termination condition is met, Do
- For each training example , Do
- 1. Input the instance and compute the output
*o*_{u}of every unit. - 2. For each output unit
*k*, calculate - 3. For each hidden unit
*h*, calculate - 4. Update each network weight
*w*_{ji}as follows:

- 1. Input the instance and compute the output

- For each training example , Do