Azure, Cloud Computing

3 Mins Read

Streamline Your Data Integration with Azure Data Factory (ADF)

Voiced by Amazon Polly

Introduction

Azure Data Factory (ADF) service is a cloud based ETL (Extract Transform and Load) and data integration service with other services that orchestrates and automates the movement and transformation of data. It allows users to move the data between on-premises systems and cloud systems.

SSIS (SQL Server Integration Services) is the integral tool for data integration in ADF. It plays a role in integrating the data from on-premises to cloud systems.

Using Data Factory, we can create and schedule the data driven workflows called pipelines that ingest the data from disparate data stores.

Customized Cloud Solutions to Drive your Business Success

  • Cloud Migration
  • Devops
  • AIML & IoT
Know More

Why use Data Factory?

  • Scheduling and Orchestration
  • Continuous Integration and Delivery
  • Security
  • Scalability

Where to use Data Factory?

For example, a top-rated gaming company collects daily huge amounts of game logs, activities, and user performance details in the cloud. So, the company focuses on its daily data to analyze these logs to predict future customer preferences and usage behavior. Also, the company wants to identify its growth and provide a better customer experience.

ADF is the best solution for this scenario as Azure Data Factory is a cloud-based ETL tool and data integration service that allows you to create workflows to orchestrate, automate the data movement, and transform the data.

How does it work?

Azure Data Factory follows a series of interconnected systems that provides a complete end-to-end platform. Please click on below link for getting more. 

https://learn.microsoft.com/en-us/azure/data-factory/media/data-flow/overview.svg 

azure

Data Factory Components

Azure Data Factory is composed of all the key components.

Datasets – These represent the data structures within the data stores, which point to the data you want to use in your activities as inputs and outputs.

Activity – It includes data transfer and control flow operations. It can take more than one dataset as input and more than one as output. It mainly supports three different types of activities: data movement, data transformation, and control activities.

Pipeline – ADF can have one or more pipelines. A pipeline is a combination of activities that performs its relevant action.

Linked Services – It is the same as the connection string, which defines the connection information needed for Data Factory to connect with external resources. This is used to represent a data store and a compute resource.

Triggers – These are configurations for pipeline scheduling that contains configuration settings like start/end dates and execution frequency etc. Its only required if you want pipelines to run automatically and on a schedule.

Variables – These will be used inside pipelines to store temporary values.

Pipelines in Azure Data Factory

An ADF can have one or more pipelines. A pipeline is a logical grouping or set of activities that perform a specific task together, allowing to manage of the activities.

Types of Scheduling Pipelines

A pipeline can be triggered by triggers. It has two types of triggers:

  • Manual Trigger: This triggers pipelines on demand.
  • Scheduler Trigger: It allows to trigger based on the clock schedule.

Activities in Azure Data Factory

An activity defines a specific action to be performed on your data. It mainly consists of three types of grouping activities that are Data Movement Activities, Data Transformation Activities, and Control Activities. The below diagram illustrates the relationship between pipeline, activity, and dataset.

azure2

Data Factory Pricing

Pricing will be calculated based on the usage of below

  • Pipeline execution and orchestration.
  • Data flow execution and debugging.
  • Several Data Factory operations like pipeline monitoring.

Runtime Integration in ADF

ADF uses the IR (Integration Runtime), the compute infrastructure, to provide data integration capabilities across the different network environments.

  • Data Flow: It executes data flow in a managed compute
  • Data Movement: It is responsible for copying data across the different stores situated in either public or private networks.
  • Activity Dispatch: This monitors the activities running on compute services like Azure HDInsight and SQL Server.
  • SSIS Package Execution: It executes SQL Server Integration Services (SSIS) package in a compute

There are three types of IR, which are supported in ADF, and we need to choose any of the below IR types that serve data integration capabilities and network environments requirements:

  • Azure – Azure supports data flow, movement, and activity dispatch in both public and private network support.
  • Self-hosted – supports only data movement and activity dispatch in public and private network support.
  • Azure-SSIS – supports only SSIS package execution in both public network and private link support.

Conclusion

ADF is a powerful service in Azure for developers to perform different types of operations on the data. It also includes some components, and each component has its potential role in every operation on data. The cost of this service is based on the usage of several pipelines and other operations in ADF service.

Get your new hires billable within 1-60 days. Experience our Capability Development Framework today.

  • Cloud Training
  • Customized Training
  • Experiential Learning
Read More

About CloudThat

CloudThat is a leading provider of Cloud Training and Consulting services with a global presence in India, the USA, Asia, Europe, and Africa. Specializing in AWS, Microsoft Azure, GCP, VMware, Databricks, and more, the company serves mid-market and enterprise clients, offering comprehensive expertise in Cloud Migration, Data Platforms, DevOps, IoT, AI/ML, and more.

CloudThat is the first Indian Company to win the prestigious Microsoft Partner 2024 Award and is recognized as a top-tier partner with AWS and Microsoft, including the prestigious ‘Think Big’ partner award from AWS and the Microsoft Superstars FY 2023 award in Asia & India. Having trained 850k+ professionals in 600+ cloud certifications and completed 500+ consulting projects globally, CloudThat is an official AWS Advanced Consulting Partner, Microsoft Gold Partner, AWS Training PartnerAWS Migration PartnerAWS Data and Analytics PartnerAWS DevOps Competency PartnerAWS GenAI Competency PartnerAmazon QuickSight Service Delivery PartnerAmazon EKS Service Delivery Partner AWS Microsoft Workload PartnersAmazon EC2 Service Delivery PartnerAmazon ECS Service Delivery PartnerAWS Glue Service Delivery PartnerAmazon Redshift Service Delivery PartnerAWS Control Tower Service Delivery PartnerAWS WAF Service Delivery PartnerAmazon CloudFront Service Delivery PartnerAmazon OpenSearch Service Delivery PartnerAWS DMS Service Delivery PartnerAWS Systems Manager Service Delivery PartnerAmazon RDS Service Delivery PartnerAWS CloudFormation Service Delivery PartnerAWS ConfigAmazon EMR and many more.

FAQs

1. What is the use of the wait activity?

ANS: – The pipeline waits for the specified time before resuming the execution of an activity.

2. Is Azure Synapse Analytics follows the concept of the pipeline?

ANS: – Yes, both Azure Data Factory and Synapse Analytics use pipelines.

WRITTEN BY Sridhar Andavarapu

Sridhar Andavarapu is a Senior Research Associate at CloudThat, specializing in AWS, Python, SQL, data analytics, and Generative AI. With extensive experience in building scalable data pipelines, interactive dashboards, and AI-driven analytics solutions, he helps businesses transform complex datasets into actionable insights. Passionate about emerging technologies, Sridhar actively researches and shares insights on AI, cloud analytics, and business intelligence. Through his work, he aims to bridge the gap between data and strategy, helping enterprises unlock the full potential of their analytics infrastructure.

Share

Comments

    Click to Comment

Get The Most Out Of Us

Our support doesn't end here. We have monthly newsletters, study guides, practice questions, and more to assist you in upgrading your cloud career. Subscribe to get them all!