Systematic approach to Face Recognition using CNN with deep learning

Recent developments in face recognition using convolutional neural networks (CNN) have surpassed conventional artificial intelligence techniques, achieving an accuracy up-to 98%. Using CNN, face detection is done using appearance based techniques, while feature detection of eyes, mouth, etc. are done using feature-invariant methodologies. And various methods and techniques are used to address the problems that arises while implementing and using CNN’s for face detection amongst which data augmentation method to create new images from existing images by applying random modifications, and transfer learning technique to train on one task and then used as a foundation for training another CNN to detect faces are widely used.

Overview:

Overall face detection and face identification are the two phases that make up face recognition which can also be through the integration of CNN and Haar Cascade features to create such a model for face recognition. Where CNN architecture can be used as a facial recognition classifier, Haar Cascade to extract features and reduce the dimensionality of process data. For tasks involving picture recognition and processing CNN works especially well. Convolutional, pooling andfully connected layers are some of the layers that make up the model for face recognition.

The model first takes an input through initial layer called the input layer. Which can be then transferred to the procedure of max pooling where essential properties or features are extracted. The Max Pooling can be done for every pixel division. Then comes the Hidden units, and finally Fully connected layer clubs and extracts the essential information or gives us as Output. The convolutional layers of the model are used to extract features from the input image, such as edges, textures, and shapes. Local features are collected from the local receptive field of the previous layer, which is connected to each neuron’s input. And the convolution process employs the subsequent formula:

Where s(i,j) is the value of the corresponding position element of the output matrix corresponding to the convolution kernel W, X_k is the k^thinput matrix, b is the offset of the neurons on the convolutional layer feature map, and n-in is the number of input matrices or the dimension of the last tensor. W_K is the representation of the convolution kernel’s k sub-convolution kernel matrices. Pooling layers in CNNs reduce spatial dimensions while retaining key features. They improve computational efficiency by decreasing data points, address positional differences in images, and help minimize overfitting. Max pooling, as used here, divides the feature map into 2×2 non-overlapping sections and selects the maximum value from each, streamlining data for further processing. Fully connected layers then utilize this condensed information for image classification or forecasting. ResNet (Residual Networks) can be used to address the vanishing gradient problem.

Methodology:

Now let’s dive deeper into the four classic steps for face recognition i.e face detection, face alignment, feature extraction/ face representation, and classification. Undoubtedly the most important stage is feature extraction due to its ability to represent faces in a way that minimizes variations due to factors such as lighting, pose and expression while emphasizing the aspect that are most relevant for discrimination.

1) Face Detection: Attendees' general information and facial images are stored in the database for recognition.
2) Face Alignment: Facial landmarks are used to detect faces. The Viola-Jones method identifies bounding boxes for alignment before recognition.
3) Feature Extraction: Haar Cascade represents training images as weighted features, compacting them into key characteristics. Recognized attendees access facilities; unregistered ones are prompted to register.
4) Classification: After recognition, attendance is recorded in the database for program verification.

Pattern Identification: Machine learning detects patterns using historical or statistical data, involving:

1)Classification: Assigning labels via preset features.
2)Clustering: Grouping similar data.
3)Regression: Predicting relationships and outcomes.
4)Characteristics: Analyzing variables (binary, discrete, or continuous).

Conclusion:

Thus the results can be analysed with the help of confusion matrices and plotting the curve for training and validation loss for each epochs. Thus the general workflow the proposed methodology in comparison to conventional approaches, a method that combined Haar-liked features and AdaBoost-trained classifier cascade performed better for face recognition problems.