Computer Vision is an interdisciplinary field of science that aims to make computers process, analyze images and videos and extract details in the same way a human mind does. Earlier Computer Vision was meant only to mimic human visual systems until we realized how AI can augment its applications and vice versa
CNN
Introduction. A Convolutional Neural Network (ConvNet/CNN) is a Deep Learning algorithm which can take in an input image, assign importance (learnable weights and biases) to various aspects/objects in the image and be able to differentiate one from the other.
Work by Hubel and Wiesel in the 1950s and 1960s showed that cat and monkey visual cortexes contain neurons that individually respond to small regions of the visual field. Provided the eyes are not moving, the region of visual space within which visual stimuli affect the firing of a single neuron is known as its receptive field.[23] Neighboring cells have similar and overlapping receptive fields.[citation needed] Receptive field size and location varies systematically across the cortex to form a complete map of visual space.[citation needed] The cortex in each hemisphere represents the contralateral visual field.[citation needed]
Their 1968 paper identified two basic visual cell types in the brain:[9]
simple cells, whose output is maximized by straight edges having particular orientations within their receptive field complex cells, which have larger receptive fields, whose output is insensitive to the exact position of the edges in the field. Hubel and Wiesel also proposed a cascading model of these two types of cells for use in pattern recognition tasks
BASIC STEPS INVOLVED IN THE CREATION OF CNN
1 Convolution layer
Convolution is the first layer to extract features from an input image. Convolution preserves the relationship between pixels by learning image features using small squares of input data. It is a mathematical operation that takes two inputs such as image matrix and a filter or kernel.
2 Activation function
ReLU stands for Rectified Linear Unit for a non-linear operation. The output is ƒ(x) = max(0,x).
Why ReLU is important : ReLU’s purpose is to introduce non-linearity in our ConvNet. Since, the real world data would want our ConvNet to learn would be non-negative linear values.
3 Pooling
Pooling layers section would reduce the number of parameters when the images are too large. Spatial pooling also called subsampling or downsampling which reduces the dimensionality of each map but retains important information. Spatial pooling can be of different types:
Max Pooling
Average Pooling
Sum Pooling
4 Fully connected layer
The layer we call as FC layer, we flattened our matrix into vector and feed it into a fully connected layer like a neural network.
SUMMARY
Applying convolution to the selected image
Adding activation (commonly relu)
Feature extraction through pooling
Flattening the image and feeding to nueral network