Unleashing CNNs for Object Detection, Facial Recognition, and Image Classification

Introduction

Convolutional neural networks (CNNs) are deep learning algorithm that has revolutionized image and video recognition tasks.

CNNs use a mathematical operation called convolution to extract features from images, then classify the image into different categories. The organization of neurons inspires the architecture of CNNs in the visual cortex of animals. The first layer of a CNN consists of multiple filters that convolve over the input image, performing various feature extractions.

These filters can detect the input image’s edges, corners, or other patterns. As we move deeper into the network, each layer combines and recombines features extracted by previous layers to form more complex representations. Finally, these representations are fed into fully connected layers for classification. CNNs have been used successfully in various applications, such as object detection, facial recognition, and natural language processing.

Their ability to learn complex features from raw data has made them an essential tool for machine learning practitioners working with image or video data. Convolutional neural networks (CNNs) architecture is designed to process images and other types of multidimensional data effectively. A typical CNN consists of multiple layers, including convolutional, pooling, and fully connected layers. Convolutional layers are the backbone of CNNs and use a set of learnable filters to extract features from input images.

Fully connected layers are used at the end of a CNN to perform classification or regression tasks based on the extracted features. These layers connect every neuron in one layer to every neuron in the next layer. Overall, the architecture and components of CNNs allow for efficient processing and analysis of complex visual data such as images and videos. Convolutional neural networks (CNNs) are a type of deep learning algorithm that has revolutionized image and video recognition.

cnn

Pioneers in Cloud Consulting & Migration Services

Reduced infrastructural costs
Accelerated application deployment

Get Started

A Sample Python Code for Implementing CNN

import mnist
import numpy as np
from conv import Conv3x3
from maxpool import MaxPool2
from softmax import Softmax
train_images = mnist.train_images()[:1000]
train_labels = mnist.train_labels()[:1000]
test_images = mnist.test_images()[:1000]
test_labels = mnist.test_labels()[:1000]

conv = Conv3x3(8)                  # 28x28x1 -> 26x26x8
pool = MaxPool2()                  # 26x26x8 -> 13x13x8
softmax = Softmax(13 * 13 * 8, 10) # 13x13x8 -> 10

def forward(image, label):
  '''
  Completes a forward pass of the CNN and calculates the accuracy and
  cross-entropy loss.
  - image is a 2d numpy array
  - label is a digit
  '''
  # We transform the image from [0, 255] to [-0.5, 0.5] to make it easier
  # to work with. This is standard practice.
  out = conv.forward((image / 255) - 0.5)
  out = pool.forward(out)
  out = softmax.forward(out)

# Calculate cross-entropy loss and accuracy. np.log() is the natural log.
  loss = -np.log(out[label])
  acc = 1 if np.argmax(out) == label else 0

return out, loss, acc

def train(im, label, lr=.005):
  '''
  Completes a full training step on the given image and label.
  Returns the cross-entropy loss and accuracy.
  - image is a 2d numpy array
  - label is a digit
  - lr is the learning rate
  '''
  # Forward
  out, loss, acc = forward(im, label)

# Calculate initial gradient
  gradient = np.zeros(10)
  gradient[label] = -1 / out[label]

# Backprop
  gradient = softmax.backprop(gradient, lr)
  gradient = pool.backprop(gradient)
  gradient = conv.backprop(gradient, lr)

return loss, acc

print('MNIST CNN initialized!')

# Train the CNN for 3 epochs
for epoch in range(3):
  print('--- Epoch %d ---' % (epoch + 1))

# Shuffle the training data
  permutation = np.random.permutation(len(train_images))
  train_images = train_images[permutation]
  train_labels = train_labels[permutation]

# Train!
  loss = 0
  num_correct = 0
  for i, (im, label) in enumerate(zip(train_images, train_labels)):
    if i % 100 == 99:
      print(
        '[Step %d] Past 100 steps: Average Loss %.3f | Accuracy: %d%%' %
        (i + 1, loss / 100, num_correct)
      )
      loss = 0
      num_correct = 0

l, acc = train(im, label)
    loss += l
    num_correct += acc

# Test the CNN
print('\n--- Testing the CNN ---')
loss = 0
num_correct = 0
for im, label in zip(test_images, test_labels):
  _, l, acc = forward(im, label)
  loss += l
  num_correct += acc

num_tests = len(test_images)
print('Test Loss:', loss / num_tests)
print('Test Accuracy:', num_correct / num_tests)

import mnist

import numpy as np

from conv import Conv3x3

from maxpool import MaxPool2

from softmax import Softmax

train_images = mnist.train_images()[:1000]

train_labels = mnist.train_labels()[:1000]

test_images = mnist.test_images()[:1000]

test_labels = mnist.test_labels()[:1000]

conv = Conv3x3(8) # 28x28x1 -> 26x26x8

pool = MaxPool2() # 26x26x8 -> 13x13x8

softmax = Softmax(13 * 13 * 8, 10) # 13x13x8 -> 10

def forward(image, label):

'''

Completes a forward pass of the CNN and calculates the accuracy and

cross-entropy loss.

- image is a 2d numpy array

- label is a digit

'''

# We transform the image from [0, 255] to [-0.5, 0.5] to make it easier

# to work with. This is standard practice.

out = conv.forward((image / 255) - 0.5)

out = pool.forward(out)

out = softmax.forward(out)

# Calculate cross-entropy loss and accuracy. np.log() is the natural log.

loss = -np.log(out[label])

acc = 1 if np.argmax(out) == label else 0

return out, loss, acc

def train(im, label, lr=.005):

'''

Completes a full training step on the given image and label.

Returns the cross-entropy loss and accuracy.

- image is a 2d numpy array

- label is a digit

- lr is the learning rate

'''

# Forward

out, loss, acc = forward(im, label)

# Calculate initial gradient

gradient = np.zeros(10)

gradient[label] = -1 / out[label]

# Backprop

gradient = softmax.backprop(gradient, lr)

gradient = pool.backprop(gradient)

gradient = conv.backprop(gradient, lr)

return loss, acc

print('MNIST CNN initialized!')

# Train the CNN for 3 epochs

for epoch in range(3):

print('--- Epoch %d ---' % (epoch + 1))

# Shuffle the training data

permutation = np.random.permutation(len(train_images))

train_images = train_images[permutation]

train_labels = train_labels[permutation]

# Train!

loss = 0

num_correct = 0

for i, (im, label) in enumerate(zip(train_images, train_labels)):

if i % 100 == 99:

print(

'[Step %d] Past 100 steps: Average Loss %.3f | Accuracy: %d%%' %

(i + 1, loss / 100, num_correct)

)

loss = 0

num_correct = 0

l, acc = train(im, label)

loss += l

num_correct += acc

# Test the CNN

print('\n--- Testing the CNN ---')

loss = 0

num_correct = 0

for im, label in zip(test_images, test_labels):

_, l, acc = forward(im, label)

loss += l

num_correct += acc

num_tests = len(test_images)

print('Test Loss:', loss / num_tests)

print('Test Accuracy:', num_correct / num_tests)

Applications

CNNs are used to identify objects on the road, such as other vehicles, pedestrians, traffic lights, etc. This helps the vehicle make informed decisions about its surroundings and navigate safely through traffic. Overall, object recognition and classification using CNNs have various applications across various industries, such as security surveillance systems, healthcare diagnostics, the retail industry, etc.
CNNs can identify patterns in medical images that are difficult for humans to detect. This is particularly useful in identifying tumors and other abnormalities. For instance, by analyzing mammograms, CNNs can help radiologists detect early signs of breast cancer. In addition to identifying diseases, CNNs can also help doctors plan treatments. They can analyze CT scans to determine the size and location of a tumor, which helps doctors plan radiation therapy or surgery.
By providing accurate diagnoses and treatment plans, these algorithms have significantly improved patient outcomes while reducing the burden on healthcare providers. In recent years, convolutional neural networks (CNNs) have proven to be very effective for analyzing and classifying natural language data. CNNs are used in natural language processing (NLP) applications such as sentiment analysis and automatic translation.
Sentiment analysis involves identifying the emotional tone of a piece of text or speech, which is useful for businesses to understand their customer’s feedback. The automatic translation uses NLP techniques to translate from one language to another, which is essential for global communication. CNNs are also used for text classification tasks such as spam filtering and topic modeling. Topic modeling automatically discovers hidden topics in large collections of documents.

Conclusion

CNNs are designed to mimic how the human brain processes visual information, making them well-suited for object detection, facial recognition, and image classification. One of the most important applications of CNNs in image recognition is object detection. By analyzing an image with multiple layers of filters, a CNN can identify specific objects within an image and locate them within the frame.

Empowering organizations to become ‘data driven’ enterprises with our Cloud experts.

Reduced infrastructure costs
Timely data-driven decisions

Get Started

About CloudThat

CloudThat is an award-winning company and the first in India to offer cloud training and consulting services worldwide. As a Microsoft Solutions Partner, AWS Advanced Tier Training Partner, and Google Cloud Platform Partner, CloudThat has empowered over 850,000 professionals through 600+ cloud certifications winning global recognition for its training excellence including 20 MCT Trainers in Microsoft’s Global Top 100 and an impressive 12 awards in the last 8 years. CloudThat specializes in Cloud Migration, Data Platforms, DevOps, IoT, and cutting-edge technologies like Gen AI & AI/ML. It has delivered over 500 consulting projects for 250+ organizations in 30+ countries as it continues to empower professionals and enterprises to thrive in the digital-first world.

FAQs

1. Why do we use a Pooling Layer in a CNN?

ANS: –

CNN uses pooling layers to reduce the size of the input image to speed up the computation of the network.
It is applied after convolution and RELU operations.
It reduces the dimension of each feature map by retaining the most important information.
Since the number of hidden layers required to learn the complex relations present in the image would be large.
As a result of pooling, even if the picture were a little tilted, the largest number in a certain region of the feature map would have been recorded.

2. What is the feature map size for a given input size image, Filter Size, Stride, and Padding amount?

ANS: – Stride tells us how many pixels we will jump when convolving filters. If our input image has a size of n x n and filters size f x f and p is the Padding amount, and s is the Stride, then the dimension of the feature map is given by: Dimension = floor[ ((n-f+2p)/s)+1] x floor[ ((n-f+2p)/s)+1]

WRITTEN BY Neetika Gupta

Neetika Gupta works as a Senior Research Associate in CloudThat has the experience to deploy multiple Data Science Projects into multiple cloud frameworks. She has deployed end-to-end AI applications for Business Requirements on Cloud frameworks like AWS, AZURE, and GCP and Deployed Scalable applications using CI/CD Pipelines.