Quikr migrates from Google Cloud to AWS Cloud Platform

About Client

Quikr, which now operates leading transaction marketplaces built on top of India’s largest classifieds platform for online buying and rentals, is India’s largest platform that is used by over 30 million unique users a month. The Quikr platform operates across desktop, laptop, and mobile phones, and allows consumers as well as small businesses to sell, buy, rent, and find things across its multiple categories with great ease.

Problem Statement

The client requires to migrate their 47965 BigQuery tables and 82 GCS buckets, which totals to 200 TB data from GCP to AWS cloud platform to increase availability of their online marketplace platform and cater to the 30 million unique monthly users. All the BigQuery data needed to be converted in Parquet format for better analytics and reduce the storage cost.

Business Objectives

  • Use analytics services of AWS cloud platform.
  • Reduce data storage cost.
  • Optimize performance of data analysis using Apache Parquet format.

Technical Objectives

  • Migrate 47965 BigQuery tables that total to 150 TB of Analytics data from GCP Big Query to AWS S3
  • Migrate 82 GCS buckets that total to 50 TB data from GCP Coldline Storage to AWS S3
  • Configure the 8000+ Glue Crawlers to create the databases and tables for Athena.
  • To verify the data, we wrote a script that would count the number of columns and rows in the respective tables before and after migration as well as compared the schema along with the data types for the same.

Design Factors

  • GCP Dataproc clusters were used to migrate the data from GCP BigQuery to AWS S3
  • The process we followed helped client in reduction of the storage cost as the data was converted in Parquet format during migration and stored data in AWS S3.
  • For GCS Coldline data migration, we have used Dataproc cluster to migrate data from 82 buckets into a single AWS S3 bucket as per client requirements.
  • Setting up Athena Tables for all the data migrated
  • All the Glue Crawlers were deployed using Terraform script.
  • Custom Python Scripts have performed verification of rows and columns along with Schema of all the tables before and after migration.

Amazon Services Used

  • GCP Dataproc
  • GCP Compute Engine
  • Google Cloud Storage 
  • Amazon Athena
  • AWS Glue
  • Amazon S3

Architecture Diagram and Designs

Outcomes

  • We have migrated all the from GCP BigQuery and GCS buckets to Amazon S3 in 5 days successfully.
  • Verified the datafiles in AWS in the next 5 days.
  • Currently, the client is using AWS services for their analytics workloads, such as Amazon Athena.

Lessons Learned

  • Apache Parquet is designed for efficiency as well as the performant flat columnar storage format of data.
  • For reducing the cost of a massive amount of storage, the data was converted in Parquet format also helped the client to reduce the cost of storage and Querying on S3 using Amazon Athena.