Course Overveiw

In this course, you learn about data engineering on Google Cloud, the roles and responsibilities of data engineers, and how those map to offerings provided by Google Cloud. You also learn about ways to address data engineering challenges.

After completing this course, students will be able to:

  • Understand the role of a data engineer.
  • Identify data engineering tasks and core components used on Google Cloud.
  • Understand how to create and deploy data pipelines of varying patterns on Google Cloud.
  • Identify and utilize various automation techniques on Google Cloud.

Upcoming Batches

Enroll Online
Start Date End Date

To be Decided

Key features of Introduction to Data Engineering on Google Cloud:

  • Overview of Google Cloud Platform (GCP) for Data Engineering

    • Introduction to Google Cloud services and products relevant to data engineering, such as Google Cloud Storage, BigQuery, Cloud Pub/Sub, Dataflow, and Cloud Dataproc.
    • Understanding the concept of serverless computing with tools like BigQuery (for querying) and Dataflow (for data processing).
  • Data Storage Solutions

    • Google Cloud Storage: Exploring scalable object storage for unstructured data.
    • BigQuery: Leveraging Google’s fully managed data warehouse for running large-scale SQL queries on petabytes of data.
    • Cloud Spanner: A fully managed, scalable relational database with strong consistency across regions.
    • Cloud SQL: Managed relational databases (PostgreSQL, MySQL) on Google Cloud.
  • Data Ingestion and Integration

    • Using Cloud Pub/Sub for real-time messaging and event-driven architectures.
    • Cloud Dataflow: Streamlining ETL (Extract, Transform, Load) processes with serverless data processing.
    • Data Fusion: Managing ETL workflows with a fully managed, cloud-native integration tool.
  • Data Processing and Transformation

    • Dataflow: Creating and managing data pipelines for batch and stream processing using Apache Beam.
    • Cloud Dataproc: Running big data workloads using Apache Hadoop, Spark, and Hive on Google Cloud.
    • Dataprep: A tool for preparing and cleaning data with a visual interface before loading it into BigQuery.
  • Data Analytics and Visualization

    • BigQuery ML: Machine learning capabilities integrated into BigQuery, allowing users to build models using SQL syntax.
    • Looker: Data exploration and business intelligence platform for creating dashboards and visualizing data.
    • Integration of Google Data Studio for creating reports and dashboards from various data sources.
  • Data Security and Governance

    • Identity and Access Management (IAM): Implementing access control for users and services to secure data.
    • Cloud Key Management: Protecting sensitive data using encryption and key management services.
    • Audit Logs: Tracking and monitoring data usage and activity across GCP resources.
  • Data Orchestration and Workflow Management

    • Understanding Cloud Composer, which is based on Apache Airflow, for managing complex workflows and scheduling tasks.
  • Cost Management

    • Cost Optimization: Learning how to monitor and optimize resource usage to control costs while working with GCP data services.
    • BigQuery Pricing: Managing costs for storage, querying, and streaming data.

Who Should Attend the training ?

  • Data engineers
  • Database administrators
  • System administrators

Prerequisites:

  • Prior Google Cloud experience at the fundamental level using Cloud Shell and accessing products from the Google Cloud console.
  • >Basic proficiency with a common query language such as SQL.
  • Experience with data modeling and extract, transform, load (ETL) activities.
  • Experience developing applications using a common programming language such as Python.
  • Why choose CloudThat as your training partner?

    • Specialized GCP Focus: CloudThat specializes in cloud technologies, offering focused and specialized training programs. We are Authorized Trainers for the Google Cloud Platform. This specialization ensures in-depth coverage of GCP services, use cases, best practices, and hands-on experience tailored specifically for GCP.
    • Industry-Recognized Trainers: CloudThat has a strong pool of industry-recognized trainers certified by GCP. These trainers bring real-world experience and practical insights into the training sessions, comprehensively understanding how GCP is applied in different industries and scenarios.
    • Hands-On Learning Approach: CloudThat emphasizes a hands-on learning approach. Learners can access practical labs, real-world projects, and case studies that simulate actual GCP environments. This approach allows learners to apply theoretical knowledge in practical scenarios, enhancing their understanding and skill set.
    • Customized Learning Paths: CloudThat understands that learners have different levels of expertise and varied learning objectives. We offer customized learning paths, catering to beginners, intermediate learners, and professionals seeking advanced GCP skills.
    • Interactive Learning Experience: CloudThat's training programs are designed to be interactive and engaging. We utilize various teaching methodologies like live sessions, group discussions, quizzes, and mentorship to keep learners engaged and motivated throughout the course.
    • Placement Assistance and Career Support: CloudThat often provides placement assistance and career support services. This includes resume building, interview preparation, and connecting learners with job opportunities through our network of industry partners and companies looking for GCP-certified professionals.
    • Continuous Learning and Updates: CloudThat ensures that our course content is regularly updated to reflect the latest trends, updates, and best practices within the GCP ecosystem. This commitment to keeping the content current enables learners to stay ahead in their GCP knowledge.
    • Positive Reviews and Testimonials: Reviews and testimonials from past learners can strongly indicate the quality of training provided. You can Check feedback and reviews about our GCP courses that can provide potential learners with insights into the effectiveness and value of the training.

    Course Outline:- Download Course Outline

    Topics

    • The role of a data engineer
    • Data sources versus data sinks
    • Data formats
    • Storage solution options on Google Cloud
    • Metadata management options on Google Cloud
    • Sharing datasets using Analytics Hub

    Objectives

    • Explain the role of a data engineer.
    • Understand the differences between a data source and a data sink.
    • Explain the different types of data formats.
    • Explain the storage solution options on Google Cloud.
    • Learn about the metadata management options on Google Cloud.
    • Understand how to share datasets with ease using Analytics Hub.
    • Understand how to load data into BigQuery using the Google Cloud console or the gcloud CLI.

    Activities

    • Lab 1 Quiz

    Topics

    • Replication and migration architecture
    • The gcloud command-line tool
    • Moving datasets
    • Datastream

    Objectives

    • Explain the baseline Google Cloud data replication and migration architecture.
    • Understand the options and use cases for the gcloud command-line tool.
    • Explain the functionality and use cases for Storage Transfer Service.
    • Explain the functionality and use cases for Transfer Appliance.
    • Understand the features and deployment of Datastream.

    Activities

    • Lab 1 Quiz

    Topics

    • Extract and load architecture
    • The bq command-line tool
    • BigQuery Data Transfer Service
    • BigLake

    Objectives

    • Explain the baseline extract and load architecture diagram.
    • Understand the options of the bq command-line tool.
    • Explain the functionality and use cases for BigQuery Data Transfer Service.
    • Explain the functionality and use cases for BigLake as a non-extract-load pattern.

    Activities

    • Lab 1 Quiz

    Topics

    • Extract, load, and transform (ELT) architecture
    • SQL scripting and scheduling with BigQuery
    • Dataform

    Objectives

    • Explain the baseline extract, load, and transform architecture diagram.
    • Understand a common ELT pipeline on Google Cloud.
    • Learn about BigQuery’s SQL scripting and scheduling capabilities.
    • Explain the functionality and use cases for Dataform.

    Activities

    • • 1 Lab 1 Quiz

    Topics

    • Extract, transform, and load (ETL) architecture
    • Google Cloud GUI tools for ETL data pipelines
    • Batch data processing using Dataproc
    • Streaming data processing options
    • Bigtable and data pipelines

    Objectives

    • Explain the baseline extract, transform, and load architecture diagram.
    • Learn about the GUI tools on Google Cloud used for ETL data pipelines.
    • Explain batch data processing using Dataproc.
    • Learn how to use Dataproc Serverless for Spark for ETL.
    • Explain streaming data processing options.
    • Explain the role Bigtable plays in data pipelines.

    Activities

    • Lab 1 Quiz

    Topics

    • Automation patterns and options for pipelines
    • Cloud Scheduler and Workflows
    • Cloud Composer
    • Cloud Run Functions
    • Eventarc

    Objectives

    • Explain the automation patterns and options available for pipelines.
    • Learn about Cloud Scheduler and Workflows.
    • Learn about Cloud Composer.
    • Learn about Cloud Run functions.
    • Explain the functionality and automation use cases for Eventarc.

    Activities

    • Lab 1 Quiz

    Certification

    • CloudThat Course Completion Certificate

    Course Fee

    Select Course date

    Can't See the Date? Contact Us to Enroll and Get More Information

    Add to Wishlist

    Course ID: 23678

    Course Price at

    $749 + 0% TAX
    Enroll Now
    Enquire Now