Now let us see one of the crucial layers used in convolution neural networks which helps in optimization/scaling when working with images.
Suppose you have got a huge size image, something like 4000 x 3000. For an image this large, there will be too many input nodes and maybe a large number of nodes in the hidden layer. So what do we do to reduce the image size? One option is to preprocess and reduce image dimension. Now let’s look at a layer that resizes (reduces) image size without losing prominent features.
Max pooling
In max pooling, we iterate over an image with a given window size and stride and pick the max value in the window as the output value.
Max pooling helps in the extraction of more prominent features like edges. Here comes the difference between max pooling and resizing using preprocessing.
Average pooling
In average pooling, we iterate similarly to max pooling but pick the average value instead of the maximum value.
Pooling helps to make the representation approx invariant to a small translation of input. This means that even if the pixel value is slightly moved from the expected position, pooling handles that since we take maximum over a window size and the position of the pixel in the window doesn’t matter.
Advantages
- pooling reduces computational requirements by reducing the size of the image.
- It handles input of varying sizes.
Disadvantages
It can complicate some kinds of neural network architecture that uses top-down information such as Boltzmann machines and autoencoders.