AI/ML, Cloud Computing, Data Analytics

4 Mins Read

Improve Accuracy by tuning PSM values of Tesseract – Part 2

Introduction

This blog continues with Part 1 of Improve Accuracy by tuning the PSM values of Tesseract, where I have discussed the Page Segmentation Modes of PSM 0 to PSM 5. I will elaborate more about PSM 6 to PSM 13 with examples.

This blog discusses improving accuracy by adjusting the PSM (Page Segmentation Mode) in Tesseract, an Open-Source OCR Engine developed by Google.

Let’s deep dive into the remaining PSM mode with examples.

PSM Mode 6: Assume a single uniform block of text

If your input image follows a consistent font type, for example, you are scanning an OCR for Novels, Books, Newspapers, etc. PSM mode 6 will give you the most accurate results.

Input Image:

mode6

Output:

mode6b

Pioneers in Cloud Consulting & Migration Services

  • Reduced infrastructural costs
  • Accelerated application deployment
Get Started

PSM Mode 7: Treat the image as a single text line

In this mode, the tesseract assumes that the input image consists of a single line of uniform text. This will be useful when scanning any Number Plates, Title based on the use case, etc.

Input Image:

mode7

Output:

mode7b

PSM Mode 8: Treat the image as a single word

If you have a single word of uniform text, then PSM mode 8 could help with better accuracy. PSM modes 7 and 8 can be used interchangeably.

Input Image:

mode8

Output:

mode8b

With PSM Value 3, output as below

mode8c

PSM Mode 9: Treat the image as a single word in a circle

PSM mode 9 is used in Tesseract when you want to recognize text arranged in a circular pattern. In this mode, Tesseract treats the image as a single word in a circle and tries to recognize the characters in that circular arrangement. It can be useful when extracting text from logos, emblems, or circular graphics containing text. However, it may not be as accurate as other modes designed for standard text arrangements.

Note: I tried with PSM value 9 for many images of circular oriented text, but the accuracy is poor.

Input Image:

mode9

Output:

Since the confidence is very low, it produced no OCR text.

mode9b

PSM Mode 10: Treat the image as a single character

This works when you have an input image having just 1 character, and this could be useful when you want to recognize each character in a word after doing ROIs

Input Image:

mode10

Output:

mode10b

When there is no PSM, the output follows as below.

mode10c

PSM Mode 11: Sparse text. Find as much text as possible in no particular order

When dealing with images that contain a large amount of text, using the sparse text mode can be advantageous. This is because the mode focuses solely on extracting the text rather than its organization or arrangement within the image. Therefore, it can be useful when the primary goal is to capture as much text as possible without being concerned with its structure.

Note: OSD is not performed in this mode

Input Image:

mode11

Output:

mode11b

mode11c

PSM Mode 12: Sparse text with OSD

PSM Mode 12 works the same way as 11 if we have done OSD first, then PSM 11.

Note: The result is the same as tested with the above PSM 11.

PSM Mode 13: Raw line. Treat the image as a single text line, bypassing Tesseract-specific hacks

This mode will bypass all the performance functions, attributes, and segmentation methods and treats the input image as a single text line.

Input Image:

mode13

Output Image:

When PSM = 13

mode13b

When PSM = 3

mode13c

Conclusion

We have seen different PSM modes of Tesseract (PSM Mode 6 to PSM Mode 13) in which you can improvise the accuracy when doing OCR. Different PSM modes have different use cases, contributing to increased input data accuracy.

Note: Always stick to PSM–3, the default one, even after approaching all segmentation modes. If the results are not promising, then give it a try with PSM –13. PSM is not the only way to increase accuracy, and you also will have to pay attention to various Image Processing techniques for better results.

Making IT Networks Enterprise-ready – Cloud Management Services

  • Accelerated cloud migration
  • End-to-end view of the cloud environment
Get Started

About CloudThat

CloudThat is an official AWS (Amazon Web Services) Advanced Consulting Partner and Training partner and Microsoft Gold Partner, helping people develop knowledge of the cloud and help their businesses aim for higher goals using best-in-industry cloud computing practices and expertise. We are on a mission to build a robust cloud computing ecosystem by disseminating knowledge on technological intricacies within the cloud space. Our blogs, webinars, case studies, and white papers enable all the stakeholders in the cloud computing sphere.

Drop a query if you have any questions regarding Tesseract, I will get back to you quickly.

To get started, go through our Consultancy page and Managed Services Package, CloudThat’s offerings.

FAQs

1. What is PyTesseract?

ANS: – PyTesseract is a Python wrapper for the Tesseract OCR engine. It allows you to use Tesseract’s OCR functionality in your Python code, making extracting text from images, PDFs, and other scanned documents easier.

2. Can I use Tesseract to recognize text in multiple languages?

ANS: – Yes, Tesseract supports the recognition of text in multiple languages. You can specify the language using the “lang” parameter.

WRITTEN BY Ganesh Raj

Ganesh Raj V works as a Sr. Research Associate at CloudThat. He is a highly analytical, creative, and passionate individual experienced in Data Science, Machine Learning algorithms, and Cloud Computing. In a quest to learn and work with recent technologies, he strives hard to stay updated on advanced technologies along efficiently solving problems analytically.

Share

Comments

    Click to Comment

Get The Most Out Of Us

Our support doesn't end here. We have monthly newsletters, study guides, practice questions, and more to assist you in upgrading your cloud career. Subscribe to get them all!