Voiced by Amazon Polly |
Overview
If you are building modern data pipelines on AWS, you have likely come across Amazon EMR, one of the most reliable and scalable big data processing services out there for running Apache Spark, Apache Hive, Presto, and other frameworks.
However, the way you deploy Amazon EMR has evolved. It’s no longer just about launching clusters on Amazon EC2. AWS now offers multiple deployment options, including Amazon EMR on Amazon EKS and Amazon EMR Serverless.
In this post, I’ll explain these newer options, how they work behind the scenes, and when you might choose one over another based on your workload patterns and infrastructure preferences.
Pioneers in Cloud Consulting & Migration Services
- Reduced infrastructural costs
- Accelerated application deployment
Amazon EMR on Amazon EKS
At a high level, Amazon EMR on Amazon EKS lets you run Apache Spark workloads on Amazon Elastic Kubernetes Service (Amazon EKS). Instead of managing long-running EMR clusters on Amazon EC2 instances, you run your jobs as containerized applications in your existing Kubernetes environment.
Why it matters:
Suppose your organization is already using Amazon EKS to run applications. This lets you consolidate data processing and application workloads onto the same infrastructure, simplifying operations and modernizing your data platform.
How It Works:
- Set up an Amazon EKS cluster (or use an existing one).
- Register a virtual cluster for Amazon EMR within your Amazon EKS namespace.
- Submit Spark jobs to this virtual cluster.
- Amazon EMR automatically runs those jobs in containers on Amazon EKS.
- The Spark containers spin up for the job, complete the task, and terminate, with no idle clusters or resource waste.
Why You Might Choose It:
- If you are already managing workloads on Amazon EKS.
- When you need multiple versions of Spark running side by side.
- This is for organizations looking to consolidate infrastructure while retaining control over configuration and scaling.
Amazon EMR Serverless
If you want to take infrastructure management completely off your plate, Amazon EMR Serverless is the answer. It’s a fully managed, serverless platform for running Apache Spark and Apache Hive workloads without provisioning or managing clusters, containers, or infrastructure.
How It Works:
- You define an application in Amazon EMR Serverless.
- Submit your Spark or Hive job.
- Amazon EMR Serverless automatically provisions compute and memory resources.
- The job runs, scales up or down as needed, and the resources terminate when the job completes.
- You can track metrics and logs through Amazon CloudWatch and native Spark or Tez UIs.
Why You Might Choose It:
- When you need to focus solely on your data processing jobs and pipelines.
- For unpredictable, intermittent, or bursty workloads.
- To avoid managing cluster sizing, scaling, and patching.
- When you require a pay-as-you-go model based on actual vCPU and memory used during job runtime.
When Should You Use Each Option?
Choose Amazon EMR on Amazon EC2 when:
- You need full control over infrastructure and cluster configuration.
- You’re migrating from on-premises Hadoop or Spark deployments.
- You run clusters with high, predictable utilization (80%+).
Go with Amazon EMR on Amazon EKS when:
- You already operate applications on Amazon EKS.
- You need multiple Spark versions in parallel.
- Fast Spark runtime upgrade cycles and containerized job workflows matter to you.
Opt for Amazon EMR Serverless when:
- You don’t want to manage infrastructure.
- Your workloads are variable, bursty, or unpredictable.
- You prefer a simple pay-as-you-go model without upfront provisioning.
Conclusion
If you are already invested in Amazon EKS, Amazon EMR on Amazon EKS is a natural fit for running data jobs alongside application containers.
If you want pure simplicity, Amazon EMR Serverless lets you focus on the data processing logic, while AWS handles scaling and infrastructure behind the scenes.
Either way, modern data teams have flexible, efficient ways to run big data frameworks on AWS without being tied to heavyweight clusters.
Drop a query if you have any questions regarding Amazon EKS or Amazon EMR and we will get back to you quickly.
Making IT Networks Enterprise-ready – Cloud Management Services
- Accelerated cloud migration
- End-to-end view of the cloud environment
About CloudThat
CloudThat is a leading provider of Cloud Training and Consulting services with a global presence in India, the USA, Asia, Europe, and Africa. Specializing in AWS, Microsoft Azure, GCP, VMware, Databricks, and more, the company serves mid-market and enterprise clients, offering comprehensive expertise in Cloud Migration, Data Platforms, DevOps, IoT, AI/ML, and more.
CloudThat is the first Indian Company to win the prestigious Microsoft Partner 2024 Award and is recognized as a top-tier partner with AWS and Microsoft, including the prestigious ‘Think Big’ partner award from AWS and the Microsoft Superstars FY 2023 award in Asia & India. Having trained 650k+ professionals in 500+ cloud certifications and completed 300+ consulting projects globally, CloudThat is an official AWS Advanced Consulting Partner, Microsoft Gold Partner, AWS Training Partner, AWS Migration Partner, AWS Data and Analytics Partner, AWS DevOps Competency Partner, AWS GenAI Competency Partner, Amazon QuickSight Service Delivery Partner, Amazon EKS Service Delivery Partner, AWS Microsoft Workload Partners, Amazon EC2 Service Delivery Partner, Amazon ECS Service Delivery Partner, AWS Glue Service Delivery Partner, Amazon Redshift Service Delivery Partner, AWS Control Tower Service Delivery Partner, AWS WAF Service Delivery Partner, Amazon CloudFront Service Delivery Partner, Amazon OpenSearch Service Delivery Partner, AWS DMS Service Delivery Partner, AWS Systems Manager Service Delivery Partner, Amazon RDS Service Delivery Partner, AWS CloudFormation Service Delivery Partner and many more.
FAQs
1. What is the difference between Amazon EMR on Amazon EKS and Amazon EMR Serverless?
ANS: – Amazon EMR on Amazon EKS runs Spark jobs in containers on an existing Amazon EKS cluster, while Amazon EMR Serverless automatically provides compute provisions without requiring cluster management.
2. When should I use Amazon EMR Serverless over other deployment options?
ANS: – Use Amazon EMR Serverless to run big data jobs without managing infrastructure, especially for variable or on-demand workloads.

WRITTEN BY Bineet Singh Kushwah
Bineet Singh Kushwah works as Associate Architect at CloudThat. His work revolves around data engineering, analytics, and machine learning projects. He is passionate about providing analytical solutions for business problems and deriving insights to enhance productivity. In a quest to learn and work with recent technologies, he spends the most time on upcoming data science trends and services in cloud platforms and keeps up with the advancements.
Comments