Course Overview of Data Integration with Cloud Data Fusion

This course introduces learners to Google Cloud’s data integration capabilities. It addresses the challenges of modern data integration and demonstrates how Cloud Data Fusion acts as a powerful middleware. You will explore visual pipeline design, metadata tracking, and data lineage while learning to deploy pipelines on various execution engines for both batch and real-time streaming data

After completing Data Integration with Cloud Data Fusion, participants will be able to:

  • Identify data integration challenges and how Cloud Data Fusion solves them.
  • Navigate and utilize the core components of the Cloud Data Fusion architecture.
  • Design, execute, and monitor both batch and real-time data pipelines.
  • Use Wrangler to clean and transform data through UI and CLI methods.
  • Integrate diverse data sources using specialized connectors.
  • Implement data governance using metadata and data lineage tracking.
  • Troubleshoot and optimize pipeline execution environments.

Upcoming Batches

Loading Dates...

Key Features of Data Integration with Cloud Data Fusion

  • Visual Pipeline Design: Build complex ETL/ELT workflows without writing code using a drag-and-drop interface. 

  •   Wrangler Integration: Interactive data preparation and transformation using the Wrangler UI and directives.

  •  Hybrid & Multi-Cloud Support: Utilize a vast library of connectors to integrate data from various on-premises and cloud sources. 

  • Advanced Observability: Comprehensive tracking of metadata and data lineage for governance and troubleshooting.

  • Streaming & Batch: Unified platform for designing both high-volume batch jobs and real-time streaming pipelines.

  •  Security & Compliance: Integration with Cloud Data Loss Prevention (DLP) API for sensitive data handling.

Who Should Attend this Course on Data Integration with Cloud Data Fusion:

  • Data Engineers, Data Architects, and Data Analysts responsible for building and managing data integration pipelines.

Pre-requisites of Data Integration with Cloud Data Fusion

  • Basic understanding of data integration (ETL/ELT) concepts.
  • Familiarity with Google Cloud storage and database services is recommended.
  • Learning objective of Data Integration with Cloud Data Fusion

    •  Platform Proficiency: Understand the capabilities and core components of Cloud Data Fusion. 
    •   Pipeline Development: Design and execute end-to-end data processing pipelines.
    •  Data Transformation: Master the use of Wrangler for building complex data transformations.
    •  Connectivity: Use connectors to integrate data from various formats and sources. 
    •  Operational Management: Configure execution environments and monitor pipeline health.
    • Governance: Differentiate between business, technical, and operational metadata and understand data lineage.

    Why choose CloudThat as a training partner for Data Integration with Cloud Data Fusion

    • Specialized GCP Focus: CloudThat specializes in cloud technologies, offering focused and specialized training programs. We are Authorized Trainers for the Google Cloud Platform. This specialization ensures in-depth coverage of GCP services, use cases, best practices, and hands-on experience tailored specifically for GCP
    • Industry-Recognized Trainers: CloudThat has a strong pool of industry-recognized trainers certified by GCP. These trainers bring real-world experience and practical insights into the training sessions, comprehensively understanding how GCP is applied in different industries and scenarios. 
    • Hands-On Learning Approach: CloudThat emphasizes a hands-on learning approach. Learners can access practical labs, real-world projects, and case studies that simulate actual GCP environments. This approach allows learners to apply theoretical knowledge in practical scenarios, enhancing their understanding and skill set. 
    • Customized Learning Paths: CloudThat understands that learners have different levels of expertise and varied learning objectives. We offer customized learning paths, catering to beginners, intermediate learners, and professionals seeking advanced GCP skills. 
    • Interactive Learning Experience: CloudThat's training programs are designed to be interactive and engaging. We utilize various teaching methodologies like live sessions, group discussions, quizzes, and mentorship to keep learners engaged and motivated throughout the course. 
    • Placement Assistance and Career Support: CloudThat often provides placement assistance and career support services. This includes resume building, interview preparation, and connecting learners with job opportunities through our network of industry partners and companies looking for GCP-certified professionals. 
    • Continuous Learning and Updates: CloudThat ensures that our course content is regularly updated to reflect the latest trends, updates, and best practices within the GCP ecosystem. This commitment to keeping the content current enables learners to stay ahead in their GCP knowledge. 
    • Positive Reviews and Testimonials: Reviews and testimonials from past learners can strongly indicate the quality of training provided. You can Check feedback and reviews about our GCP courses that can provide potential learners with insights into the effectiveness and value of the training. 

    Course Outline of Data Integration with Cloud Data Fusion Download Course Outline

    Lecture Content

    • Challenges of traditional ETL/ELT data integration architectures
    • The role and business necessity of modern cloud middleware solutions
    • Centralized data engineering workflows and ecosystem bottlenecks Learning Objectives

    Learning Objectives

    • Identify the core challenges of data integration within enterprise architectures
    • Articulate the functional necessity of middleware in modern multi-cloud data landscapes

    Lab Content

    • NA

    Lecture Content

    • Core architectural principles of Cloud Data Fusion (built on CDAP)
    • Navigating the primary components: Hub, Control Center, Pipeline Studio,
    • Under-the-hood engine mapping (Cloud Dataproc, Cloud Storage, Compute Engine)

    Learning Objectives

    • Explain the design framework and logical components of Cloud Data Fusion
    • Describe how the underlying infrastructure operates during pipeline provisioning

    Lab Content

    • NA

    Lecture Content

    • Visual design patterns within the Pipeline Studio graphical interface
    • Configuring pipeline nodes: Sources, Transforms, Sinks, and Actions
    • Best practices for managing pipeline configurations, variables, and macros

    Learning Objectives

    • Construct, deploy, and visually orchestrate end-to-end batch data pipelines
    • Incorporate parameterization techniques to create reusable pipeline templates

    Lab Content

    • Lab: Building and Executing Your First Batch Pipeline in Cloud Data Fusion

    Lecture Content

    • Configuring runtime compute environments (Ephemeral vs. Static Cloud Dataproc clusters)
    • Tracking execution lifecycles, understanding resource allocation, and optimizing performance
    • Troubleshooting failed pipeline runs using execution logs and error-handling strategies

    Learning Objectives

    • Configure and provision optimal underlying compute environments for pipeline execution
    • Debug, trace, and resolve common pipeline operational execution failures

    Lab Content

    • Lab: Monitoring Performance and Troubleshooting Pipeline Execution Errors

    Lecture Content

    • Introduction to visual data profiling and schema validation using Cloud Data Fusion Wrangler
    • Applying UI-based data manipulations (Parsing, splitting, cleansing, and datatype casting)
    • Generating and managing directive-based data cleaning steps (JEXL expressions and CLI transformations)

    Learning Objectives

    • Leverage the Wrangler interface to visually profile and map unstructured/semi-structured data
    • Standardize and clean messy operational data datasets using advanced transformation directives

    Lab Content

    • Lab: Data Preparation and Cleansing Using Cloud Data Fusion Wrangler

    Lecture Content

    • Integrating diverse data sources via the Hub (Relational DBs, ERP systems, SaaS platforms)
    • Architecting real-time event ingestion using streaming pipeline frameworks (Pub/Sub integrations)
    • Securing sensitive information in transit by injecting Cloud Data Loss Prevention (DLP) API transformations

    Learning Objectives

    • Build high-availability real-time streaming pipelines to capture continuous event data
    • Implement programmatic masking and data protection rules using native DLP security extensions

    Lab Content

    • Lab: Building Real-Time Streaming Ingestion Pipelines with DLP Redaction

    Lecture Content

    • Activating the automated discovery of enterprise dataset metadata
    • Tracking operational, technical, and business data lineage paths (Field-level and dataset-level)
    • Auditing structural mutations, compliance validation, and maintaining accurate data histories

    Learning Objectives

    • Implement end-to-end data lineage tracking to maintain compliance and impact analysis visibility
    • Discover and audit structural data dependencies across multiple enterprise source components

    Lab Content

    • Lab: Auditing Data History and Visualizing Lineage in Cloud Data Fusion

    Lecture Content

    • Review of key concepts covered across all modules
    • Comprehensive reference architecture analysis for production data meshes
    • Google Cloud-recommended data fabric architectural guidelines

    Learning Objectives

    • Synthesize core capabilities of Cloud Data Fusion to engineer scalable enterprise data infrastructure

    Lab Content

    • NA

    Certification details of Data Integration with Cloud Data Fusion

      CloudThat Course Completion Certificate will be awarded to all learners who complete the training

    Select Course date

    Loading Dates...
    Add to Wishlist

    Course ID: 19476

    Course Price at

    Loading price info...
    Enroll Now

    FAQs of Data Integration with Cloud Data Fusion

    It is a fully managed, cloud-native data integration service for quickly building and managing data pipelines.

    No, the course focuses on the visual, code-free interface of Cloud Data Fusion.

    Wrangler is an interactive tool within Data Fusion used for data cleaning and preparation, covered in Module 05

    Yes, Module 06 specifically focuses on building and executing streaming pipelines.

    It is the tracking of data's origins and the transformations it undergoes, which is essential for data governance.

    Yes, the course teaches how to use various connectors for diverse data sources.

    Yes, the course includes graded labs and activities for every major module.

    It is a 2-day instructor-led session.

    Yes, it includes the use of the Cloud Data Loss Prevention (DLP) API.

    The course discusses deploying pipelines on various execution environments supported by Data Fusion.

    Enquire Now