AI/ML, Cloud Computing, Data Analytics

4 Mins Read

Improve Accuracy by tuning PSM values of Tesseract – Part 1

Voiced by Amazon Polly

Introduction

In today’s digital age, the ability to process large amounts of data quickly and accurately is essential for businesses to stay competitive. However, many businesses still rely heavily on paper-based records, forms, and documents, which can be time-consuming and error-prone. This is where OCR (Optical Character Recognition) technology comes in. OCR technology helps businesses convert paper-based records into digital form, making it easier to manage data. It’s particularly beneficial for businesses dealing with large volumes of records, like those in the legal, healthcare, and finance industries.

OCR reduces time and resource consumption, minimizes the risk of data loss or damage, and enables businesses to extract insights from data contained within documents. Adopting OCR solutions helps businesses improve efficiency, reduce costs, and gain a competitive edge in today’s digital marketplace.

In this blog, I will discuss improving accuracy by adjusting the PSM (Page Segmentation Mode) in Tesseract, an Open-Source OCR Engine developed by Google.

Pioneers in Cloud Consulting & Migration Services

  • Reduced infrastructural costs
  • Accelerated application deployment
Get Started

Tesseract vs Pytesseract

Tesseract and Pytesseract are two OCR engines based on the same underlying Tesseract OCR engine. Tesseract is a command-line tool that can be used in any programming language or platform, while Pytesseract is a Python wrapper specifically designed for use in Python.

Different PSM Mode

If you go to the official documentation, Tesseract expects a page of text when it segments an input image. If you use OCR on a single character, word, line, or paragraph, use a different PSM (Page Segmentation Mode). You can learn about complete PSM modes using the tesseract –h command.

There are 14 PSM modes as of version 3.21

mode

PSM Mode 0: Orientation and script detection (OSD) only

The –psm 0 mode does not perform character recognition but rather orientation and script detection. This means that it analyzes the input image to determine how the page is oriented and the confidence of the script, but it does not output the recognized text.

OSD returns two main things,

Script: Based on the language detected, what script is it like (Latin, Han, etc.).

Page Orientation in Degrees of angle (0, 90, 180, 270)

Input Image1:

psm0

Output:

mode1b

Input Image 2:

mode2b

Input Image 3:

mode3

 

mode3b

PSM Mode 1: Automatic page segmentation with OSD

PSM mode 1 captures the OSD information and does the Page Segmentation based on it, but the end user won’t know the OSD information but only the OCR results.

Input Image 1:

mode4

Output:

mode4b

PSM Mode 2: Automatic page segmentation, but no OSD or OCR

As of today, it is not yet implemented by Tesseract, you can check by running this command: tesseract –help-psm

mode5

PSM Mode 3: Fully automatic page segmentation, but no OSD (Default)

The default PSM, known as PSM 3, automatically segments the input text into multiple words, lines, and paragraphs, treating it as a proper page.

Note: There is no OSD performed in PSM 3.

Input Image 1:

mode5b

Output:

mode5c

PSM Mode 4: Assume a single column of text of variable sizes

PSM 4 will be very useful when dealing with columnar data like tabular data, receipts, tax invoices, spreadsheet data, etc.

Input Image:

mode6

Output:

mode6b

PSM Mode 5: Assume a single uniform block of vertically aligned text

This mode assumes the input image consists of vertically aligned and extracts the text from it in a readable horizontal way.

Input Image1:

mode7

Output:

mode7b

Conclusion

We have seen different PSM modes of PSM 0 to PSM 5 of Tesseract, which can improve accuracy when doing OCR. Different Page Segmentation Modes (PSM) in Tesseract are designed to handle different types of input images and text arrangements. Each mode has specific use cases, and using the appropriate mode for a given image can result in better accuracy and recognition rates. In Part 2 of the blog, we will explore and deep dive into PSM modes 6 to 13.

Empowering organizations to become ‘data driven’ enterprises with our Cloud experts.

  • Reduced infrastructure costs
  • Timely data-driven decisions
Get Started

About CloudThat

CloudThat is a leading provider of Cloud Training and Consulting services with a global presence in India, the USA, Asia, Europe, and Africa. Specializing in AWS, Microsoft Azure, GCP, VMware, Databricks, and more, the company serves mid-market and enterprise clients, offering comprehensive expertise in Cloud Migration, Data Platforms, DevOps, IoT, AI/ML, and more.

CloudThat is the first Indian Company to win the prestigious Microsoft Partner 2024 Award and is recognized as a top-tier partner with AWS and Microsoft, including the prestigious ‘Think Big’ partner award from AWS and the Microsoft Superstars FY 2023 award in Asia & India. Having trained 850k+ professionals in 600+ cloud certifications and completed 500+ consulting projects globally, CloudThat is an official AWS Advanced Consulting Partner, Microsoft Gold Partner, AWS Training PartnerAWS Migration PartnerAWS Data and Analytics PartnerAWS DevOps Competency PartnerAWS GenAI Competency PartnerAmazon QuickSight Service Delivery PartnerAmazon EKS Service Delivery Partner AWS Microsoft Workload PartnersAmazon EC2 Service Delivery PartnerAmazon ECS Service Delivery PartnerAWS Glue Service Delivery PartnerAmazon Redshift Service Delivery PartnerAWS Control Tower Service Delivery PartnerAWS WAF Service Delivery PartnerAmazon CloudFront Service Delivery PartnerAmazon OpenSearch Service Delivery PartnerAWS DMS Service Delivery PartnerAWS Systems Manager Service Delivery PartnerAmazon RDS Service Delivery PartnerAWS CloudFormation Service Delivery PartnerAWS ConfigAmazon EMR and many more.

FAQs

1. What is OCR?

ANS: – OCR stands for Optical Character Recognition, the technology that allows a computer to recognize and convert printed or handwritten text into machine-encoded text that can be edited, searched, and analyzed.

2. What is Tesseract?

ANS: – Tesseract is a free and open-source optical character recognition (OCR) engine developed by Google.

3. What preprocessing steps can I take to improve Tesseract accuracy?

ANS: – Preprocessing steps to improve Tesseract accuracy include thresholding, binarization, deskewing, and denoising the image.

WRITTEN BY Ganesh Raj

Ganesh Raj V works as a Sr. Research Associate at CloudThat. He is a highly analytical, creative, and passionate individual experienced in Data Science, Machine Learning algorithms, and Cloud Computing. In a quest to learn and work with recent technologies, he strives hard to stay updated on advanced technologies along efficiently solving problems analytically.

Share

Comments

    Click to Comment

Get The Most Out Of Us

Our support doesn't end here. We have monthly newsletters, study guides, practice questions, and more to assist you in upgrading your cloud career. Subscribe to get them all!