AWS, Cloud Computing, DevOps

5 Mins Read

Automate File conversion with AWS Batch and S3

Overview

AWS Batch is a fully managed service provided by Amazon Web Services (AWS) that enables developers to run batch computing workloads in the cloud.

It simplifies provisioning and managing the infrastructure required to execute batch jobs, allowing users to focus on their applications rather than infrastructure management.

Key Functionalities of AWS Batch

  1. Job Scheduling and Orchestration: AWS Batch provides a robust job scheduling and orchestration system that allows you to define dependencies and priorities for batch jobs.
  2. Scalable Compute Resources: With AWS Batch, you can easily scale your compute resources up or down based on the demand of your batch jobs.
  3. Docker Container Support: AWS Batch integrates seamlessly with Docker containers, enabling you to package your batch job applications and their dependencies.
  4. Cost Optimization: AWS Batch helps optimize costs by allowing you to define and manage the allocation of compute resources based on specific requirements and workload patterns.

Pioneers in Cloud Consulting & Migration Services

  • Reduced infrastructural costs
  • Accelerated application deployment
Get Started

Use Cases for AWS Batch

  1. Data Processing and ETL (Extract, Transform, Load): AWS Batch is well-suited for large-scale data processing tasks like ETL pipelines.
  2. Scientific and Research Computing: Researchers and scientists often need to perform computationally intensive simulations or data analysis.
  3. Media Processing and Encoding: Media companies can leverage AWS Batch to process and encode large volumes of media files.
  4. Financial Analytics: Financial institutions can benefit from AWS Batch for running financial analytics and risk modeling computations.

Example

Suppose you work for a publishing company that needs to convert many HTML documents into PDF format for digital distribution. Performing this conversion manually can be time-consuming and tedious.

Here’s how you can leverage AWS Batch to automate the HTML to PDF conversion process:

  1. Gather HTML files and store them in an Amazon S3 bucket.
  2. Define a Docker-based job for HTML to PDF conversion in AWS Batch.
  3. Configure compute environment for job execution, optimizing resources.
  4. Submit the job to AWS Batch for automatic scheduling and resource allocation.
  5. AWS Batch executes the job, converting HTML files to PDF in parallel.
  6. Monitor job progress, troubleshoot with logging, and set up notifications.
  7. Store converted PDF files in desired output location for distribution or archiving.

By utilizing AWS Batch for HTML to PDF conversion, you benefit from the managed infrastructure, scalability, and automation the service provides. It lets you focus on the content and conversion logic rather than the underlying infrastructure management.

Configure the Conversion Code

The required code to perform the practical is provided in this GitHub repo. Clone or fork this repository to perform the practical.

https://github.com/heistprofessor/aws-batch.git

Step-by-Step Guide

Step 1: Uploading Source HTML Files to S3:

First, we must upload the HTML files you want to convert to PDF to an Amazon S3 bucket. If you haven’t already created an Amazon S3 bucket, navigate to the AWS S3 service in the AWS Management Console and create a new bucket.

Upload your HTML files to the S3 bucket, ensuring each file has a unique key or name. Note down the bucket name and the keys/names of the HTML files, as we will need them later.

Step 2: Building the Docker image.

Fork the given repo and edit the Python file named ‘app.py’ for the given parameters:

  • AWS access key
  • AWS Secret access key
  • S3 Source bucket
  • S3 Destination bucket
  • S3 Source key
  • S3 Destination key

Now in the terminal, pass this command to build the image, which we will later use to build the docker image.

step2

step2b

Step 3: Tag the docker image and push it into the AWS ECR repository.

To tag and push a Docker image to an AWS ECR repository:

  1. Tag the Docker image with the ECR repository URI:
  1. Login to the ECR repository using the AWS CLI:
  1. Push the tagged Docker image to the ECR repository:

Ensure you replace <image-id>, <aws-account-id>, <region>, <repository-name>, and <tag> with the appropriate values for your setup.

Once pushed the image will appear like this

step3

Step 4: Navigate to the AWS Management Console and navigate to the AWS batch.

step4

Step 5: Configure AWS Batch Environment:

Next, we must set up an AWS Batch environment to run our conversion job. Follow these steps:

  1. Configure AWS Batch compute environment: Set up desired compute resources.
  2. Define AWS Batch job queue: Create a queue for conversion job requests.
  3. Create AWS Batch job definition: Specify container image, command, and parameters.
  4. Fill in the required details for the compute environments section and click ‘Create compute environments’.

Environment configuration – Fargate

Name – html-to-pdf

Service role – AWSServiceRoleForBatch (Default role)

Maximum vCPUs – 2

Select appropriate VPC, subnets, and security group.

Review the details and click on create

step5

5. Next, navigate the Job queue from the left Pane and click Create. Select the orchestration type as Fargate, provide a name, set priority to 100, and select the previously created compute environment. Click on Create Job queue.

step5b

6. Next, navigate to Job Definitions and click on Create. Choose the orchestration type as Fargate, provide a name, enable assign public IP, choose the execution role, and click on next.

On the next page, paste the image URI copied earlier in the image URI option. In command syntax, give the below command as JSON and click on the next page.

Command syntax:

Select AWS logs in logging and click on the next page

Review the details and create it

step5c

7. Navigate to Job and submit a new job. Provide a name, select the job definition and job queue, and click next page. Check the vCPUs and Memory and click on the next page. Review the details and create the job.

step5d

Job is succeeded.

step5e

Source bucket where index.html file is located

step5f

Destination bucket where index.pdf file is uploaded after conversion.

step5g

Conclusion

AWS Batch provides a streamlined solution for automating batch computing workloads like HTML to PDF conversion. It simplifies job scheduling, resource allocation, and infrastructure management, reducing processing time. Monitoring and logging features ensure visibility and troubleshooting. Organizations leveraging AWS Batch optimize workflows, save time, and enhance productivity.

Making IT Networks Enterprise-ready – Cloud Management Services

  • Accelerated cloud migration
  • End-to-end view of the cloud environment
Get Started

About CloudThat

CloudThat is an official AWS (Amazon Web Services) Advanced Consulting Partner and Training partner and Microsoft Gold Partner, helping people develop knowledge of the cloud and help their businesses aim for higher goals using best-in-industry cloud computing practices and expertise. We are on a mission to build a robust cloud computing ecosystem by disseminating knowledge on technological intricacies within the cloud space. Our blogs, webinars, case studies, and white papers enable all the stakeholders in the cloud computing sphere.

Drop a query if you have any questions regarding AWS Batch, Amazon S3, I will get back to you quickly.

To get started, go through our Consultancy page and Managed Services Package, CloudThat’s offerings.

FAQs

1. How does AWS Batch differ from other compute services provided by AWS?

ANS: – AWS Batch is specifically designed for batch computing workloads, focusing on efficient job scheduling, resource allocation, and scalability.

2. What are the key benefits of using AWS Batch?

ANS: – Some key benefits of using AWS Batch include simplified infrastructure management, automatic job scheduling, and resource allocation, scalability to handle varying workloads.

3. How does AWS Batch handle job scheduling and resource allocation?

ANS: – AWS Batch provides a job scheduling and orchestration system that allows you to define dependencies and priorities for batch jobs.

4. Can I customize the compute environment in AWS Batch?

ANS: – Yes, you can customize the compute environment in AWS Batch. You can define compute resources based on your specific requirements.

WRITTEN BY Jeet Patel

Share

Comments

    Click to Comment

Get The Most Out Of Us

Our support doesn't end here. We have monthly newsletters, study guides, practice questions, and more to assist you in upgrading your cloud career. Subscribe to get them all!