Improve Accuracy by tuning PSM values of Tesseract

Introduction

This blog continues with Part 1 of Improve Accuracy by tuning the PSM values of Tesseract, where I have discussed the Page Segmentation Modes of PSM 0 to PSM 5. I will elaborate more about PSM 6 to PSM 13 with examples.

This blog discusses improving accuracy by adjusting the PSM (Page Segmentation Mode) in Tesseract, an Open-Source OCR Engine developed by Google.

Let’s deep dive into the remaining PSM mode with examples.

Pioneers in Cloud Consulting & Migration Services

Reduced infrastructural costs
Accelerated application deployment

Get Started

PSM Mode 6: Assume a single uniform block of text

If your input image follows a consistent font type, for example, you are scanning an OCR for Novels, Books, Newspapers, etc. PSM mode 6 will give you the most accurate results.

Input Image:

mode6

Output:

mode6b

PSM Mode 7: Treat the image as a single text line

In this mode, the tesseract assumes that the input image consists of a single line of uniform text. This will be useful when scanning any Number Plates, Title based on the use case, etc.

Input Image:

mode7

Output:

mode7b

PSM Mode 8: Treat the image as a single word

If you have a single word of uniform text, then PSM mode 8 could help with better accuracy. PSM modes 7 and 8 can be used interchangeably.

Input Image:

mode8

Output:

mode8b

With PSM Value 3, output as below

mode8c

PSM Mode 9: Treat the image as a single word in a circle

PSM mode 9 is used in Tesseract when you want to recognize text arranged in a circular pattern. In this mode, Tesseract treats the image as a single word in a circle and tries to recognize the characters in that circular arrangement. It can be useful when extracting text from logos, emblems, or circular graphics containing text. However, it may not be as accurate as other modes designed for standard text arrangements.

Note: I tried with PSM value 9 for many images of circular oriented text, but the accuracy is poor.

Input Image:

mode9

Output:

Since the confidence is very low, it produced no OCR text.

mode9b

PSM Mode 10: Treat the image as a single character

This works when you have an input image having just 1 character, and this could be useful when you want to recognize each character in a word after doing ROIs

Input Image:

mode10

Output:

mode10b

When there is no PSM, the output follows as below.

mode10c

PSM Mode 11: Sparse text. Find as much text as possible in no particular order

When dealing with images that contain a large amount of text, using the sparse text mode can be advantageous. This is because the mode focuses solely on extracting the text rather than its organization or arrangement within the image. Therefore, it can be useful when the primary goal is to capture as much text as possible without being concerned with its structure.

Note: OSD is not performed in this mode

Input Image:

mode11

Output:

mode11b

mode11c

PSM Mode 12: Sparse text with OSD

PSM Mode 12 works the same way as 11 if we have done OSD first, then PSM 11.

Note: The result is the same as tested with the above PSM 11.

PSM Mode 13: Raw line. Treat the image as a single text line, bypassing Tesseract-specific hacks

This mode will bypass all the performance functions, attributes, and segmentation methods and treats the input image as a single text line.

Input Image:

mode13

Output Image:

When PSM = 13

When PSM = 3

mode13c

Conclusion

We have seen different PSM modes of Tesseract (PSM Mode 6 to PSM Mode 13) in which you can improvise the accuracy when doing OCR. Different PSM modes have different use cases, contributing to increased input data accuracy.

Note: Always stick to PSM–3, the default one, even after approaching all segmentation modes. If the results are not promising, then give it a try with PSM –13. PSM is not the only way to increase accuracy, and you also will have to pay attention to various Image Processing techniques for better results.

Making IT Networks Enterprise-ready – Cloud Management Services

Accelerated cloud migration
End-to-end view of the cloud environment

Get Started

About CloudThat

CloudThat is an award-winning company and the first in India to offer cloud training and consulting services worldwide. As a Microsoft Solutions Partner, AWS Advanced Tier Training Partner, and Google Cloud Platform Partner, CloudThat has empowered over 850,000 professionals through 600+ cloud certifications winning global recognition for its training excellence including 20 MCT Trainers in Microsoft’s Global Top 100 and an impressive 12 awards in the last 8 years. CloudThat specializes in Cloud Migration, Data Platforms, DevOps, IoT, and cutting-edge technologies like Gen AI & AI/ML. It has delivered over 500 consulting projects for 250+ organizations in 30+ countries as it continues to empower professionals and enterprises to thrive in the digital-first world.

FAQs

1. What is PyTesseract?

ANS: – PyTesseract is a Python wrapper for the Tesseract OCR engine. It allows you to use Tesseract’s OCR functionality in your Python code, making extracting text from images, PDFs, and other scanned documents easier.

2. Can I use Tesseract to recognize text in multiple languages?

ANS: – Yes, Tesseract supports the recognition of text in multiple languages. You can specify the language using the “lang” parameter.

WRITTEN BY Ganesh Raj

Ganesh Raj V works as a Sr. Research Associate at CloudThat. He is a highly analytical, creative, and passionate individual experienced in Data Science, Machine Learning algorithms, and Cloud Computing. In a quest to learn and work with recent technologies, he strives hard to stay updated on advanced technologies along efficiently solving problems analytically.