{"id":59391,"date":"2024-09-16T10:33:36","date_gmt":"2024-09-16T10:33:36","guid":{"rendered":"https:\/\/www.cloudthat.com\/resources\/?post_type=resources&#038;p=59391"},"modified":"2025-02-19T05:07:27","modified_gmt":"2025-02-19T05:07:27","slug":"building-an-ai-ml-solution-on-aws-for-automated-ocr-document-extraction-with-over-90-accuracy","status":"publish","type":"resources","link":"https:\/\/www.cloudthat.com\/resources\/case-study\/building-an-ai-ml-solution-on-aws-for-automated-ocr-document-extraction-with-over-90-accuracy","title":{"rendered":"Building an AI\/ML Solution on AWS for Automated OCR Document Extraction with Over 90% Accuracy"},"content":{"rendered":"<p>NAVNEET TOPTECH is a rapidly growing digital education company in India, offering innovative eLearning solutions for schools and students. They aim to enrich teaching and learning by integrating technology to create an engaging, effective educational experience beyond traditional methods.<\/p>\n","protected":false},"author":325,"featured_media":59392,"parent":0,"template":"","cat_resources":[6],"technology":[32],"published_by":"325","primary-authors":["917","283","637"],"secondary-authors":["325"],"acf":{"banner_image":59393,"resources_label":"","download_url":"https:\/\/content.cloudthat.com\/resources\/wp-content\/uploads\/2024\/09\/NTT_OCR_2_pager_Casestudy.pdf","client_logo":59398,"highlights":{"first_part":{"icon":336,"title":"Data Extraction and OCR ","subtitle":"Smart Automation with AI\/ML"},"second_part":{"icon":335,"title":"90%","subtitle":"Precision at Scale"},"third_part":{"icon":334,"title":"Seamless Workflow Efficiency","subtitle":"Enhanced operations and boosted productivity"}},"the_challenge":"The client relied on physical documents containing book lists in tabular format, which were distributed to teachers and management for selecting required course books. To maintain records and enable analysis, the client had to manually enter these details into Excel a time-consuming, labor-intensive, and error-prone process. This manual effort led to inefficiencies, delays, and inconsistencies in data entry. To eliminate these challenges and streamline the workflow, the client sought an automated solution for accurate and efficient data extraction.","client_testimonial":{"image":"","description":"","author":""},"solutions":"\u2022 Designed a structured architecture for scalability and seamless integration.\r\n\u2022 Amazon S3 stores raw PDFs and images as a data lake.\r\n\u2022 Used Amazon EC2 for containerization, Amazon ECR for storage, Amazon Fargate for deployment, and Amazon SageMaker for development.\r\n\u2022 Processed data with Amazon Textract and stored structured output in Amazon DynamoDB.\r\n\u2022 Integrated Amazon Athena for querying and Amazon QuickSight for analytics.\r\n\u2022 Created AWS IAM users for controlled access and permissions.\r\n\u2022 Implemented Amazon CloudWatch for monitoring and performance optimization.","the_results":"A scalable AI\/ML-powered OCR solution on AWS automating data extraction with 90% accuracy, streamlining workflows and boosting productivity.","about_client_left_side":[{"field_63315a4dc06e1":"15085","field_63315a5bc06e2":"Industry\u00a0","field_63315a61c06e3":"EdTech"},{"field_63315a4dc06e1":"15083","field_63315a5bc06e2":"Expertise\u00a0","field_63315a61c06e3":"Amazon S3, Amazon DynamoDB, Amazon Athena, Amazon QuickSight, Amazon Textract"},{"field_63315a4dc06e1":"15084","field_63315a5bc06e2":"Offerings\/solutions\u00a0","field_63315a61c06e3":"AI\/ML-powered OCR on AWS for accurate data extraction and automation"}]},"_links":{"self":[{"href":"https:\/\/www.cloudthat.com\/resources\/wp-json\/wp\/v2\/resources\/59391"}],"collection":[{"href":"https:\/\/www.cloudthat.com\/resources\/wp-json\/wp\/v2\/resources"}],"about":[{"href":"https:\/\/www.cloudthat.com\/resources\/wp-json\/wp\/v2\/types\/resources"}],"author":[{"embeddable":true,"href":"https:\/\/www.cloudthat.com\/resources\/wp-json\/wp\/v2\/users\/325"}],"version-history":[{"count":5,"href":"https:\/\/www.cloudthat.com\/resources\/wp-json\/wp\/v2\/resources\/59391\/revisions"}],"predecessor-version":[{"id":64270,"href":"https:\/\/www.cloudthat.com\/resources\/wp-json\/wp\/v2\/resources\/59391\/revisions\/64270"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.cloudthat.com\/resources\/wp-json\/wp\/v2\/media\/59392"}],"wp:attachment":[{"href":"https:\/\/www.cloudthat.com\/resources\/wp-json\/wp\/v2\/media?parent=59391"}],"wp:term":[{"taxonomy":"cat_resources","embeddable":true,"href":"https:\/\/www.cloudthat.com\/resources\/wp-json\/wp\/v2\/cat_resources?post=59391"},{"taxonomy":"technology","embeddable":true,"href":"https:\/\/www.cloudthat.com\/resources\/wp-json\/wp\/v2\/technology?post=59391"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}