Cs231n lecture 5 convolutional neural networks
CNN is good for spacial structure
Filter¶
- stride (3 in this case): number of steps to move for each progress
- dimentino size: n * n
- filter size: F
Output size = (N - F)/stride+1 => e.g. stride size = 3 won't work 1. To make this case working, adding a border surrounded (padding) will make N divisable (now N is 9 instead of 7) 2. To make the size remain the same dimentionally, we also use padding.
Conv Layer¶
- Accpets a volume of size W_1\times H_1 \times D_1
- Requires four hyperparameters:
- Number of filters K,
- their spatial extent F (or filter dimension),
- the stride S,
- the amount of zero padding P.
- Produces a volume of size W_2 \times H_2 \times D_2
- W_2 = (W_1 - F + 2P)/S + 1
- H_2 = (H_1 - F + 2P)/s + 1 (width and height are the same size)
- D_2 = K
- With parameter sharing, it produces F \times F \times D_1 weights per filter, for a total of (F \cdot F \cdot D_1) * K weights
- In the output volume, the d-th depth slice (of size W_2 \times H_2) is the result of performing a valid convolution of the d-th filter over the input volume with a stride of S, and then offset by d-th bias
Pooling Layer¶
input depth would be the same, and width and height would be shrink down by a factor
Max-Pooling Layer¶
Choosing the maximum within each filter. Find the region that has fired with higher value from the other region.
Summary¶
- Accepts a volume of size W1×H1×D1
- Requires two hyperparameters:
- their spatial extent F,
- the stride S,
- Produces a volume of size W2×H2×D2 where:
- W_2=(W_1−F)/S+1
- H_2=(H_1−F)/S+1
- D_2=D_1
- Introduces zero parameters since it computes a fixed function of the input
- For Pooling layers, it is not common to pad the input using zero-padding.
It is worth noting that there are only two commonly seen variations of the max pooling layer found in practice: A pooling layer with F=3,S=2 (also called overlapping pooling), and more commonly F=2,S=2. Pooling sizes with larger receptive fields are too destructive.
Fully Connected Layer (FC)¶
Stretch out to 1-d array, usually on the last layer.