Skip to content

Cs231n lecture 4 introduction to neural networks

Computation Graph

Screen Shot 2020-04-29 at 8.25.32 PM.png

Backpropagation: used for finding the gradient

e.g. where x = -2, y = 5, z = -4

Screen Shot 2020-04-29 at 8.30.33 PM.png

Notations:

goal is to find:

\frac{\partial f}{\partial x}, \frac{\partial f}{\partial y}, \frac{\partial f}{\partial z}

In order to get all the targets, we start from the last node where

\frac{\partial f}{\partial f} = q = 1

Then find the following in sequences: 1. 2. 3. -- using chain rule 4. -- using chain rule

Using Chain rule: Use upstreaming compound and multiple with the local compound to derive the desired partial derivative

For each node, we just need "local gradient" which is the new upstream for the next level node.

Another Example

Screen Shot 2020-04-29 at 8.58.24 PM.png

Sigmoid Function

Screen Shot 2020-04-29 at 9.05.39 PM.png

Because we already know the gradient for sigmoid function, we can just replace the sigmoid gate with the analyic gradient result.

Screen Shot 2020-04-29 at 9.10.36 PM.png

Patterns in backward flow

  1. add gate: gradient distributor (split to two branches because of two terms)
  2. max gate (e.g. max(0,1)=1): gradient router (one gradient is 0 and another is the full value)
  3. mul gate: gradient switcher (multiple the other one to derive the current one)

Using matrix representation (Vectorization)

Jacobian Matrix for derivative for each vector.

However, we don't need to calculate the Jacobian matrix, because each x in the row only affects the result element in the same row. Thus the matrix is diagonal.

Screen Shot 2020-04-29 at 9.25.24 PM.png

L2 norm:

f(q) = ||q||^2 = q_1^2 + q_2^2 + q_3^3 ... + q_n^2

Partial in respect to each q_i:

or

Then calculate gradient for W, we can use chain rule:

Implement forward and backword for different gates

class MultiplyGate(object):
    def forward(x,y):
        return x * y
    def backward(dz):
        dx = self.y * dz # cache y
        dy = self.x * dz
        return [dx, dy]

where [dx, dy] is:

and dz is:


Last update: February 19, 2022