Streamline Your Data Integration with Azure Data Factory (ADF)

Introduction

Azure Data Factory (ADF) service is a cloud based ETL (Extract Transform and Load) and data integration service with other services that orchestrates and automates the movement and transformation of data. It allows users to move the data between on-premises systems and cloud systems.

SSIS (SQL Server Integration Services) is the integral tool for data integration in ADF. It plays a role in integrating the data from on-premises to cloud systems.

Using Data Factory, we can create and schedule the data driven workflows called pipelines that ingest the data from disparate data stores.

Freedom Month Sale — Upgrade Your Skills, Save Big!

Up to 80% OFF AWS Courses
Up to 30% OFF Microsoft Certs

Act Fast!

Why use Data Factory?

Scheduling and Orchestration
Continuous Integration and Delivery
Security
Scalability

Where to use Data Factory?

For example, a top-rated gaming company collects daily huge amounts of game logs, activities, and user performance details in the cloud. So, the company focuses on its daily data to analyze these logs to predict future customer preferences and usage behavior. Also, the company wants to identify its growth and provide a better customer experience.

ADF is the best solution for this scenario as Azure Data Factory is a cloud-based ETL tool and data integration service that allows you to create workflows to orchestrate, automate the data movement, and transform the data.

How does it work?

Azure Data Factory follows a series of interconnected systems that provides a complete end-to-end platform. Please click on below link for getting more.

https://learn.microsoft.com/en-us/azure/data-factory/media/data-flow/overview.svg

azure

Data Factory Components

Azure Data Factory is composed of all the key components.

Datasets – These represent the data structures within the data stores, which point to the data you want to use in your activities as inputs and outputs.

Activity – It includes data transfer and control flow operations. It can take more than one dataset as input and more than one as output. It mainly supports three different types of activities: data movement, data transformation, and control activities.

Pipeline – ADF can have one or more pipelines. A pipeline is a combination of activities that performs its relevant action.

Linked Services – It is the same as the connection string, which defines the connection information needed for Data Factory to connect with external resources. This is used to represent a data store and a compute resource.

Triggers – These are configurations for pipeline scheduling that contains configuration settings like start/end dates and execution frequency etc. Its only required if you want pipelines to run automatically and on a schedule.

Variables – These will be used inside pipelines to store temporary values.

Pipelines in Azure Data Factory

An ADF can have one or more pipelines. A pipeline is a logical grouping or set of activities that perform a specific task together, allowing to manage of the activities.

Types of Scheduling Pipelines

A pipeline can be triggered by triggers. It has two types of triggers:

Manual Trigger: This triggers pipelines on demand.
Scheduler Trigger: It allows to trigger based on the clock schedule.

Activities in Azure Data Factory

An activity defines a specific action to be performed on your data. It mainly consists of three types of grouping activities that are Data Movement Activities, Data Transformation Activities, and Control Activities. The below diagram illustrates the relationship between pipeline, activity, and dataset.

azure2

Data Factory Pricing

Pricing will be calculated based on the usage of below

Pipeline execution and orchestration.
Data flow execution and debugging.
Several Data Factory operations like pipeline monitoring.

Runtime Integration in ADF

ADF uses the IR (Integration Runtime), the compute infrastructure, to provide data integration capabilities across the different network environments.

Data Flow: It executes data flow in a managed compute
Data Movement: It is responsible for copying data across the different stores situated in either public or private networks.
Activity Dispatch: This monitors the activities running on compute services like Azure HDInsight and SQL Server.
SSIS Package Execution: It executes SQL Server Integration Services (SSIS) package in a compute

There are three types of IR, which are supported in ADF, and we need to choose any of the below IR types that serve data integration capabilities and network environments requirements:

Azure – Azure supports data flow, movement, and activity dispatch in both public and private network support.
Self-hosted – supports only data movement and activity dispatch in public and private network support.
Azure-SSIS – supports only SSIS package execution in both public network and private link support.

Conclusion

ADF is a powerful service in Azure for developers to perform different types of operations on the data. It also includes some components, and each component has its potential role in every operation on data. The cost of this service is based on the usage of several pipelines and other operations in ADF service.

Freedom Month Sale — Discounts That Set You Free!

Up to 80% OFF AWS Courses
Up to 30% OFF Microsoft Certs

Act Fast!

About CloudThat

CloudThat is an award-winning company and the first in India to offer cloud training and consulting services worldwide. As a Microsoft Solutions Partner, AWS Advanced Tier Training Partner, and Google Cloud Platform Partner, CloudThat has empowered over 850,000 professionals through 600+ cloud certifications winning global recognition for its training excellence including 20 MCT Trainers in Microsoft’s Global Top 100 and an impressive 12 awards in the last 8 years. CloudThat specializes in Cloud Migration, Data Platforms, DevOps, IoT, and cutting-edge technologies like Gen AI & AI/ML. It has delivered over 500 consulting projects for 250+ organizations in 30+ countries as it continues to empower professionals and enterprises to thrive in the digital-first world.

FAQs

1. What is the use of the wait activity?

ANS: – The pipeline waits for the specified time before resuming the execution of an activity.

2. Is Azure Synapse Analytics follows the concept of the pipeline?

ANS: – Yes, both Azure Data Factory and Synapse Analytics use pipelines.

WRITTEN BY Sridhar Andavarapu

Sridhar Andavarapu is a Senior Research Associate at CloudThat, specializing in AWS, Python, SQL, data analytics, and Generative AI. He has extensive experience in building scalable data pipelines, interactive dashboards, and AI-driven analytics solutions that help businesses transform complex datasets into actionable insights. Passionate about emerging technologies, Sridhar actively researches and shares knowledge on AI, cloud analytics, and business intelligence. Through his work, he strives to bridge the gap between data and strategy, enabling enterprises to unlock the full potential of their analytics infrastructure.